Open Issues Need Help
View All on GitHub enhancement good first issue
Different post-training techniques for LLMs, including: SFT, DPO and Online RL
Python
#alignment#dpo#fine-tuning#huggingface#huggingface-transformers#llm#pytorch#reinforcement-learning#sft#trl
Issue 1: Configuration Management 3 months ago
enhancement good first issue
Different post-training techniques for LLMs, including: SFT, DPO and Online RL
Python
#alignment#dpo#fine-tuning#huggingface#huggingface-transformers#llm#pytorch#reinforcement-learning#sft#trl