[2606.03892] Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments
Abstract page for arXiv paper 2606.03892: Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments
America Forever Bytes
Technology
Abstract page for arXiv paper 2606.03892: Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments
Abstract page for arXiv paper 2606.02645: Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics
Abstract page for arXiv paper 2510.23216: Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach
Abstract page for arXiv paper 2606.03800: Trading Human Curation for Synthetic Augmentation in RLVR
Abstract page for arXiv paper 2606.03804: Easy-to-Use Shielding for Reinforcement Learning
Abstract page for arXiv paper 2606.03102: Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling
Abstract page for arXiv paper 2606.03441: PerchRL: Vision-Based Agile Perching on Inclined Platforms under Rapid and Irregular Motion
Abstract page for arXiv paper 2606.03461: What Makes Interaction Trajectories Effective for Training Terminal Agents?
Abstract page for arXiv paper 2606.03521: Post-Hoc Robustness for Model-Based Reinforcement Learning
Abstract page for arXiv paper 2606.01952: Randomized Least Squares Value Iteration itself is Joint Differentially Private
Abstract page for arXiv paper 2606.01655: MINTS: Minimalist Thompson Sampling
Abstract page for arXiv paper 2606.02355: SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training
Abstract page for arXiv paper 2510.10544: PAC-Bayesian Reinforcement Learning Trains Generalizable Policies
Abstract page for arXiv paper 2510.11711: Reinforced sequential Monte Carlo for amortised sampling
Abstract page for arXiv paper 2605.31273: Survival Reinforcement Learning: Toward Scalable Self-Supervised RL
Abstract page for arXiv paper 2605.31328: Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards
Abstract page for arXiv paper 2605.31524: Value Functions as Supermartingale Certificates
Abstract page for arXiv paper 2605.31044: The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems
Abstract page for arXiv paper 2605.30576: Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving
Abstract page for arXiv paper 2605.30824: Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward
Abstract page for arXiv paper 2605.30461: Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics
Abstract page for arXiv paper 2605.31289: The Terminal Representation in Reinforcement Learning
New autonomous system helps stair-climbing robots recover from falls using reinforcement learning and a robotic arm.
Abstract page for arXiv paper 2605.28918: When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL
Abstract page for arXiv paper 2605.29032: Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning
Abstract page for arXiv paper 2605.29002: FedQHD: Closed-Form Function-Space Federated Reinforcement Learning
Abstract page for arXiv paper 2605.29564: VE2VF: Vision-Enabled to Vision-Free Distillation via Real-world Reinforcement Learning for Robust Contact-Rich Manipu...
Abstract page for arXiv paper 2605.30160: On Distributional Reinforcement Learning in Chaotic Dynamical Systems
Abstract page for arXiv paper 2605.30244: Reinforcement Learning with Robust Rubric Rewards
Abstract page for arXiv paper 2510.11499: Offline Reinforcement Learning with Generative Trajectory Policies