the spatial ensemble (top) and the feature ensemble (bottom). The spatial ensemble strategy involves stitching multiple example images together and resizing them to the input resolution.
The feature ensemble strategy averages features of the query image after each attention layer so that the query image aggregates all the reference examples.