a rubric driven prioritized replay rl algo to maximise continual learning

11 stars 3 forks 11 watchers Python Apache License 2.0
1 Open Issue Need Help Last updated: Sep 3, 2025

Open Issues Need Help

View All on GitHub

AI Summary: This GitHub issue proposes implementing various 'stub' (mock) components and randomized reward pathways to unblock end-to-end training runs before all real evaluation adapters, rubric graders, and environments are fully developed. The objective is to enable a fully runnable training loop, validate the single-signal scheduler and logging, and ensure a seamless transition from stubs to real components without significant code changes.

Complexity: 3/5
enhancement help wanted priority: P0

a rubric driven prioritized replay rl algo to maximise continual learning

Python