scikit-sampling: Statistical dataset sampling in Python

data-science machine-learning python sampling statistics
5 Open Issues Need Help Last updated: Jul 18, 2025

Open Issues Need Help

View All on GitHub

AI Summary: Implement a stratified sampling method in the scikit-sampling Python library. This involves creating a new function or class that takes a dataset and a stratification column as input, proportionally samples from each stratum, uses a random seed for reproducibility, and includes comprehensive documentation with examples.

Complexity: 3/5
good first issue

scikit-sampling: Statistical dataset sampling in Python

Python
#data-science#machine-learning#python#sampling#statistics

AI Summary: Create comprehensive conceptual guides for each sampling method in the scikit-sampling library. Each guide should explain the method's underlying principles, advantages, limitations, use cases, and compare it to related methods, using visual aids and references.

Complexity: 4/5
good first issue

scikit-sampling: Statistical dataset sampling in Python

Python
#data-science#machine-learning#python#sampling#statistics

AI Summary: Create an extensive examples gallery for the scikit-sampling Python library, showcasing various sampling methods with diverse datasets, clear explanations, and visualizations where appropriate. The gallery should be well-organized and easy for users to understand and adopt.

Complexity: 3/5
good first issue

scikit-sampling: Statistical dataset sampling in Python

Python
#data-science#machine-learning#python#sampling#statistics

AI Summary: Implement a weighted sampling functionality in the scikit-sampling Python library. This involves adding a 'weights' parameter to existing or new sampling functions, ensuring correct probability-based selection with or without replacement, handling weight normalization, and providing comprehensive documentation with examples.

Complexity: 3/5
good first issue

scikit-sampling: Statistical dataset sampling in Python

Python
#data-science#machine-learning#python#sampling#statistics
Implement Cluster Sampling about 2 months ago

AI Summary: Implement a cluster sampling method in the scikit-sampling Python library. This involves creating a new function or class that takes a dataset and cluster ID column as input, allowing users to specify the number or proportion of clusters to sample. The implementation should include a random_state parameter for reproducibility and comprehensive documentation with examples.

Complexity: 3/5
good first issue

scikit-sampling: Statistical dataset sampling in Python

Python
#data-science#machine-learning#python#sampling#statistics