Optimization & Training
Optimizers, loss functions, activation functions, schedulers, regularization, and inference
1.
Loss Functions
MSE, Cross-Entropy, Focal, Triplet, Contrastive, KL Divergence and more
2.
Activation Functions
Sigmoid, Tanh, ReLU, GeLU, Swish and other activation functions with derivatives
3.
Optimizers
SGD, Momentum, Nesterov, AdaGrad, RMSProp, Adam with update rules
4.
Regularization
L1, L2, Elastic Net, Dropout, Early Stopping and other regularization techniques
5.
Learning Rate Schedulers
StepLR, MultiStepLR, ExponentialLR scheduling strategies
6.
Compute Optimization Techniques
WIP
Sequence packing and efficient transformer implementations
7.
Distributed Training Paradigms
WIP
Tensor Parallelism, Sequence Parallelism, Pipeline Parallelism, and RingAttention
8.
PyTorch Lightning
WIP
High-level framework for organizing PyTorch training code
9.
ONNX & TensorRT
WIP
Exporting and optimizing deep learning models for production
10.
Inference Optimization & Decoding
WIP
Latency vs throughput, decoding strategies, Speculative Decoding, and Stateful Caching
11.
JEPA (Joint Embedding Predictive Architecture)
WIP
Yann LeCun's vision for autonomous machine intelligence: I-JEPA, V-JEPA, and EchoJEPA