Overview
llama.cpp allows running massive LLMs on consumer hardware without heavy GPU requirements, focusing heavily on CPU inference, Apple Silicon optimization, and mixed CPU/GPU execution using the GGUF format.
TODO: Add details on GGUF and usage.