Open Issues Need Help
View All on GitHub [Scaffolding] Stubs and randomized rewards to unblock E2E runs about 2 months ago
AI Summary: This GitHub issue proposes implementing various 'stub' (mock) components and randomized reward pathways to unblock end-to-end training runs before all real evaluation adapters, rubric graders, and environments are fully developed. The objective is to enable a fully runnable training loop, validate the single-signal scheduler and logging, and ensure a seamless transition from stubs to real components without significant code changes.
Complexity:
3/5
enhancement help wanted priority: P0
a rubric driven prioritized replay rl algo to maximise continual learning
Python