Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

distribution python spark
3 Open Issues Need Help Last updated: Dec 29, 2025

Open Issues Need Help

View All on GitHub

AI Summary: This issue proposes adding a progress tracking feature for long-running distribution fits. Currently, users lack visibility into the progress when fitting many distributions to large datasets. The proposed solution involves implementing a progress callback or logging mechanism to show which distributions have been fitted and provide an estimated time remaining.

Complexity: 3/5
enhancement good first issue

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

Python
#distribution#python#spark

AI Summary: This issue proposes adding Probability-Probability (P-P) plots to the project as a new feature. P-P plots are a visualization tool for goodness-of-fit, complementing existing Q-Q plots by focusing on the center of the distribution. The implementation should follow a similar API pattern to the existing `plot_qq()` function.

Complexity: 2/5
enhancement good first issue

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

Python
#distribution#python#spark

AI Summary: This issue proposes adding new methods to directly export results to JSON or CSV formats. Currently, users need to convert results to a pandas DataFrame first, which this change aims to simplify by providing `results.to_csv(path)` and `results.to_json(path)`.

Complexity: 2/5
enhancement good first issue

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

Python
#distribution#python#spark