llama.cpp

Port of LLaMA in C/C++ for CPU and mixed CPU/GPU inference

Overview

llama.cpp allows running massive LLMs on consumer hardware without heavy GPU requirements, focusing heavily on CPU inference, Apple Silicon optimization, and mixed CPU/GPU execution using the GGUF format.

TODO: Add details on GGUF and usage.