Overview
AirLLM is a tool designed to help you run extremely large models (like 70B parameter LLMs) on a single consumer GPU with limited VRAM (e.g., 8GB) by using layer-wise execution and swapping.
Mechanism
TODO: Add content on how AirLLM achieves low VRAM inference.
Usage
TODO: Add content on usage.