dwsmith1983/spark-bestfit

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

1 stars 2 forks 1 watchers Python MIT License

distribution python spark

View on GitHub Website

3 Open Issues Need Help Last updated: Dec 29, 2025

Open Issues Need Help

View All on GitHub

feat: progress tracking for long-running fits about 1 month ago

AI Summary: This issue proposes adding a progress tracking feature for long-running distribution fits. Currently, users lack visibility into the progress when fitting many distributions to large datasets. The proposed solution involves implementing a progress callback or logging mechanism to show which distributions have been fitted and provide an estimated time remaining.

Complexity: 3/5

enhancement good first issue

dwsmith1983/spark-bestfit

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

Python

#distribution#python#spark

feat: add P-P (probability-probability) plots about 2 months ago

enhancement good first issue

dwsmith1983/spark-bestfit

Efficiently fit ~100 scipy.stats distributions to your data using Spark's parallel processing with optimized Pandas UDFs and broadcast variables.

Python

#distribution#python#spark