rohit.vision
Notes Graph Search IDE About Portfolio
Notes / Deep Learning / Optimization & Training

Optimization & Training

Optimizers, loss functions, activation functions, schedulers, regularization, and inference

1.
Loss Functions
MSE, Cross-Entropy, Focal, Triplet, Contrastive, KL Divergence and more
2.
Activation Functions
Sigmoid, Tanh, ReLU, GeLU, Swish and other activation functions with derivatives
3.
Optimizers
SGD, Momentum, Nesterov, AdaGrad, RMSProp, Adam with update rules
4.
Regularization
L1, L2, Elastic Net, Dropout, Early Stopping and other regularization techniques
5.
Learning Rate Schedulers
StepLR, MultiStepLR, ExponentialLR scheduling strategies
6.
Compute Optimization Techniques WIP
Sequence packing and efficient transformer implementations
7.
Distributed Training Paradigms WIP
Tensor Parallelism, Sequence Parallelism, Pipeline Parallelism, and RingAttention
8.
PyTorch Lightning WIP
High-level framework for organizing PyTorch training code
9.
ONNX & TensorRT WIP
Exporting and optimizing deep learning models for production
10.
Inference Optimization & Decoding WIP
Latency vs throughput, decoding strategies, Speculative Decoding, and Stateful Caching
11.
JEPA (Joint Embedding Predictive Architecture) WIP
Yann LeCun's vision for autonomous machine intelligence: I-JEPA, V-JEPA, and EchoJEPA
GitHub LinkedIn Google Scholar

© 2026 Rohit Kumar. rohit.vision