-
@ Ben Lorica 罗瑞卡
2025-03-03 23:48:07Agentic reward modeling integrates human preference rewards with verifiable correctness signals for more reliable results • it consists of three components: Router, Verification Agents, and Judger • verification agents specifically assess factual correctness and instruction-following capabilities #AI https://github.com/THU-KEG/Agentic-Reward-Modeling