Synthetic Data Generation Toolkit for LLMs

52 stars 29 forks 52 watchers Python Apache License 2.0
8 Open Issues Need Help Last updated: Sep 12, 2025

Open Issues Need Help

View All on GitHub

Synthetic Data Generation Toolkit for LLMs

Python
help wanted good first issue

Synthetic Data Generation Toolkit for LLMs

Python

Synthetic Data Generation Toolkit for LLMs

Python

Synthetic Data Generation Toolkit for LLMs

Python

AI Summary: Enhance the `sdg_hub` Python library's `flow.get_info()` method to improve the readability of its output. The current output is a large nested dictionary; the task is to add options for richer formatting, potentially mimicking the original YAML structure, and allowing users to selectively view sections like metadata or blocks.

Complexity: 3/5
good first issue

Synthetic Data Generation Toolkit for LLMs

Python

AI Summary: The task involves removing redundant functionality from a Python project. Specifically, the `BlockRegistry.info()` method, which duplicates the functionality of `BaseBlock.get_info()`, needs to be removed. This requires updating the codebase to use the `BaseBlock.get_info()` method consistently and updating relevant documentation.

Complexity: 4/5
good first issue

Synthetic Data Generation Toolkit for LLMs

Python

AI Summary: Implement an optional `log_dir` parameter in the `Flow.generate()` method of the `sdg_hub` Python library. This parameter should allow users to specify a directory to save execution logs to, in addition to the existing console output. The implementation should handle file creation, timestamping to prevent overwrites, and maintain backward compatibility.

Complexity: 3/5
enhancement good first issue

Synthetic Data Generation Toolkit for LLMs

Python

AI Summary: Implement a system to generate short, human-readable IDs (slugs) for flows in the `sdg_hub` Python framework. This will improve the user experience by simplifying the process of loading flows, replacing the current method of using long, potentially error-prone full names.

Complexity: 3/5
good first issue

Synthetic Data Generation Toolkit for LLMs

Python