[2606.03131] HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

Read full story on arxiv.org
Share
[2606.03131] HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models
AI disclosure

Summary

Abstract page for arXiv paper 2606.03131: HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

Original reporting

Open original source
Read full article on arxiv.org