[2605.00155] Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

Read full story on arxiv.org
Share
[2605.00155] Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
AI disclosure

Summary

Abstract page for arXiv paper 2605.00155: Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

Original reporting

Open original source

Related coverage

Read full article on arxiv.org