GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

514 stars 40 forks 514 watchers Python Apache License 2.0
ai4science data-generation data-synthesis knowledge-graph llama-factory llm llm-training pretrain pretraining qa question-answering qwen sft sft-data xtuner
2 Open Issues Need Help Last updated: Nov 19, 2025

Open Issues Need Help

View All on GitHub

AI Summary: This GitHub issue identifies code redundancy within the `tests/e2e_tests` directory, specifically noting duplicated code in scripts generated for various task types. The proposed solution is to refactor these redundant code blocks into a common, shared method to enhance maintainability and reduce repetition.

Complexity: 2/5
good first issue

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

Python
#ai4science#data-generation#data-synthesis#knowledge-graph#llama-factory#llm#llm-training#pretrain#pretraining#qa#question-answering#qwen#sft#sft-data#xtuner
enhancement good first issue

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

Python
#ai4science#data-generation#data-synthesis#knowledge-graph#llama-factory#llm#llm-training#pretrain#pretraining#qa#question-answering#qwen#sft#sft-data#xtuner