3 Open Issues Need Help Last updated: Jul 9, 2025

Open Issues Need Help

View All on GitHub
Uncategorized Unknown
Readme Documentation about 1 month ago

AI Summary: The task is to review and potentially improve the README documentation for a project called SAMS, specifically focusing on clarifying the dataset overview provided in a separate issue. This involves understanding the provided statistics on student applications and their distribution, and determining if the README needs any changes to better explain this data.

Complexity: 2/5
documentation help wanted
Uncategorized Unknown

AI Summary: The task is to debug and improve the data saving mechanism within a data processing pipeline (SAMS) implemented in Python using Pandas and potentially Dask/Pandarallel for parallel processing. The issue involves understanding the differences between using a custom `save_data` function and `to_parquet`, resolving memory errors during saving (especially with JSON-like data), and ensuring the saving process is integrated smoothly into the pipeline. The solution likely involves optimizing data handling, potentially using more memory-efficient data formats or chunking strategies, and clarifying the intended data saving workflow within the project.

Complexity: 4/5
bug help wanted
Uncategorized Unknown

AI Summary: The task involves optimizing the processing of large JSON columns in a dataset. The current approach uses Pandas and is slow; the user seeks feedback on a refactored approach that uses chunked processing and Parquet files for improved performance and scalability, and also advice on handling large nested JSON fields and best practices for data analysis of related datasets.

Complexity: 4/5
help wanted