Different post-training techniques for LLMs, including: SFT, DPO and Online RL

alignment dpo fine-tuning huggingface huggingface-transformers llm pytorch reinforcement-learning sft trl
2 Open Issues Need Help Last updated: Sep 10, 2025

Open Issues Need Help

View All on GitHub
enhancement good first issue

Different post-training techniques for LLMs, including: SFT, DPO and Online RL

Python
#alignment#dpo#fine-tuning#huggingface#huggingface-transformers#llm#pytorch#reinforcement-learning#sft#trl
enhancement good first issue

Different post-training techniques for LLMs, including: SFT, DPO and Online RL

Python
#alignment#dpo#fine-tuning#huggingface#huggingface-transformers#llm#pytorch#reinforcement-learning#sft#trl