Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

distribution python spark
3 Open Issues Need Help Last updated: Dec 29, 2025

Open Issues Need Help

View All on GitHub

AI Summary: This issue proposes adding a progress tracking feature for long-running distribution fits. Currently, users lack visibility into the progress when fitting many distributions to large datasets. The proposed solution involves implementing a progress callback or logging mechanism to show which distributions have been fitted and provide an estimated time remaining.

Complexity: 3/5
enhancement good first issue

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

Python
#distribution#python#spark
enhancement good first issue

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

Python
#distribution#python#spark
enhancement good first issue

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

Python
#distribution#python#spark