Notes / Deep Learning / Transformers Transformers Transformer architecture and variants 1. The Transformer Architecture WIP Self-Attention, Multi-Head Attention, and the Encoder-Decoder structure 2. Titans (Google Research) WIP Learning to memorize at test time and deep memory architectures