TRL (Transformer Reinforcement Learning)

Hugging Face library for RLHF, SFT, DPO, and post-training AI agents

Overview

TRL (Transformer Reinforcement Learning) is a Hugging Face library designed to train transformer language models with Reinforcement Learning. It covers the full post-training pipeline: from Supervised Fine-tuning (SFT) to Reward Modeling (RM), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO).

Key Features

  • SFTTrainer: A wrapper around Hugging Face Trainer for easy Supervised Fine-Tuning.
  • DPOTrainer: For Direct Preference Optimization.
  • PPOTrainer: For classical RLHF pipelines.

Agent Post-Training

TRL is increasingly being used to fine-tune models to act as Agents (tool use, reasoning paths, step-by-step thinking). By using datasets of function-calling and agentic behavior, you can use TRL to post-train base models to explicitly output structured commands and reflect on tool outputs.

Examples & Notebooks

TODO: Add specific code snippets for SFTTrainer and post-training agentic behaviors.