Perceiver IO: A General Architecture for Structured Inputs & Outputs

August 2021

tl;dr: Perceiver that generalizes beyond classification task to dense prediction and multimodal output tasks.

Overall impression

The general architecture follows that of Perceiver, but the output is more versatile and thus Perceiver IO can handle more tasks than classification.

The paper seems to be heavily inspired by DETR by adding a output query to handle the dense prediction tasks.

Both Perceiver and Perceiver IO are “fully attentional network” (FAN?). They use read-process-write architecture: inputs are read (encoded) into a latent space, processed by transformers and then written (decoded) to produce outputs.

Key ideas

Technical details