Target-driven optimization of feature representation and model selection for next-generation sequencing data

4 Open Issues Need Help Last updated: Aug 8, 2025

Open Issues Need Help

View All on GitHub
AI/ML Bioinformatics

AI Summary: Document the hyperparameters used in the `ritme` project for both model training and feature engineering. This involves describing each hyperparameter's meaning and purpose, drawing information from the provided `run_config_whparams.json` file and expanding to include undocumented feature engineering options like `data_selection` and `data_aggregation`.

Complexity: 4/5
documentation good first issue

Target-driven optimization of feature representation and model selection for next-generation sequencing data

Python
AI/ML Bioinformatics

AI Summary: Expand the hyperparameter search space for the RandomForest model in the `ritme` project to be more comparable to the existing XGBoost search space, specifically increasing the range of `n_estimators`. This involves modifying the `model_hyperparameters` section within the JSON configuration file used by `ritme`'s model selection process.

Complexity: 3/5
enhancement good first issue

Target-driven optimization of feature representation and model selection for next-generation sequencing data

Python
AI/ML Bioinformatics

AI Summary: Expand the hyperparameter search space for the RandomForest model within the `ritme` framework to better handle low-variance target variables (e.g., log-transformed data), improving the model's ability to predict beyond the mean in such scenarios. This involves modifying the existing configuration file or code to include a wider range of hyperparameter values for RandomForest.

Complexity: 4/5
enhancement good first issue

Target-driven optimization of feature representation and model selection for next-generation sequencing data

Python
AI/ML Bioinformatics

AI Summary: Modify the `ritme` software to use a more unique prefix (other than "F") to identify microbial features in the input data, preventing conflicts with metadata columns that might start with "F". This involves changing the code that parses column names to identify features.

Complexity: 3/5
enhancement good first issue

Target-driven optimization of feature representation and model selection for next-generation sequencing data

Python