open-sciencelab/GraphGen

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

514 stars 40 forks 514 watchers Python Apache License 2.0

ai4science data-generation data-synthesis knowledge-graph llama-factory llm llm-training pretrain pretraining qa question-answering qwen sft sft-data xtuner

View on GitHub Website

2 Open Issues Need Help Last updated: Nov 19, 2025

Open Issues Need Help

View All on GitHub

e2e tests代码冗余 3 months ago

AI Summary: This GitHub issue identifies code redundancy within the `tests/e2e_tests` directory, specifically noting duplicated code in scripts generated for various task types. The proposed solution is to refactor these redundant code blocks into a common, shared method to enhance maintainability and reduce repetition.

Complexity: 2/5

good first issue

open-sciencelab/GraphGen

514

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

Python

#ai4science#data-generation#data-synthesis#knowledge-graph#llama-factory#llm#llm-training#pretrain#pretraining#qa#question-answering#qwen#sft#sft-data#xtuner

Quiz and Judge. Probs normalization. 3 months ago

enhancement good first issue

open-sciencelab/GraphGen

514

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

Python

#ai4science#data-generation#data-synthesis#knowledge-graph#llama-factory#llm#llm-training#pretrain#pretraining#qa#question-answering#qwen#sft#sft-data#xtuner