Transformers | Rohit Kumar | rohit.vision

The Transformer Architecture WIP

Self-Attention, Multi-Head Attention, and the Encoder-Decoder structure

Titans (Google Research) WIP

Learning to memorize at test time and deep memory architectures