Technology

reinforcement learning

Tracked in 33 AFBytes stories. First seen May 28, 2026. Last seen Jun 02, 2026.

Recent coverage

[2606.01952] Randomized Least Squares Value Iteration itself is Joint Differentially Private

arxiv.org · Jun 2, 2026 04:00 UTC

[2606.01952] Randomized Least Squares Value Iteration itself is Joint Differentially Private

Abstract page for arXiv paper 2606.01952: Randomized Least Squares Value Iteration itself is Joint Differentially Private

science tech

[2606.01655] MINTS: Minimalist Thompson Sampling

arxiv.org · Jun 2, 2026 04:00 UTC

[2606.01655] MINTS: Minimalist Thompson Sampling

Abstract page for arXiv paper 2606.01655: MINTS: Minimalist Thompson Sampling

science tech

[2606.02355] SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

arxiv.org · Jun 2, 2026 04:00 UTC

[2606.02355] SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

Abstract page for arXiv paper 2606.02355: SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

science tech

[2510.10544] PAC-Bayesian Reinforcement Learning Trains Generalizable Policies

arxiv.org · Jun 1, 2026 04:00 UTC

[2510.10544] PAC-Bayesian Reinforcement Learning Trains Generalizable Policies

Abstract page for arXiv paper 2510.10544: PAC-Bayesian Reinforcement Learning Trains Generalizable Policies

science

[2510.11711] Reinforced sequential Monte Carlo for amortised sampling

arxiv.org · Jun 1, 2026 04:00 UTC

[2510.11711] Reinforced sequential Monte Carlo for amortised sampling

Abstract page for arXiv paper 2510.11711: Reinforced sequential Monte Carlo for amortised sampling

science

[2605.31273] Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.31273] Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

Abstract page for arXiv paper 2605.31273: Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

science tech

[2605.31328] Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.31328] Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards

Abstract page for arXiv paper 2605.31328: Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards

science tech

[2605.31524] Value Functions as Supermartingale Certificates

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.31524] Value Functions as Supermartingale Certificates

Abstract page for arXiv paper 2605.31524: Value Functions as Supermartingale Certificates

science tech

[2605.31044] The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.31044] The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems

Abstract page for arXiv paper 2605.31044: The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems

science tech

[2605.30824] Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.30824] Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

Abstract page for arXiv paper 2605.30824: Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

science tech

[2605.30576] Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.30576] Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

Abstract page for arXiv paper 2605.30576: Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

science tech

[2605.30461] Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.30461] Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

Abstract page for arXiv paper 2605.30461: Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

science tech

[2605.31289] The Terminal Representation in Reinforcement Learning

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.31289] The Terminal Representation in Reinforcement Learning

Abstract page for arXiv paper 2605.31289: The Terminal Representation in Reinforcement Learning

science tech

Robots learn to catch themselves during dangerous stair falls

interestingengineering.com · May 29, 2026 18:15 UTC

Robots learn to catch themselves during dangerous stair falls

New autonomous system helps stair-climbing robots recover from falls using reinforcement learning and a robotic arm.

tech science

[2605.28918] When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

arxiv.org · May 29, 2026 04:00 UTC

[2605.28918] When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Abstract page for arXiv paper 2605.28918: When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

science

[2605.29032] Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning

arxiv.org · May 29, 2026 04:00 UTC

[2605.29032] Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning

Abstract page for arXiv paper 2605.29032: Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning

science

[2605.29002] FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

arxiv.org · May 29, 2026 04:00 UTC

[2605.29002] FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

Abstract page for arXiv paper 2605.29002: FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

science

[2605.29564] VE2VF: Vision-Enabled to Vision-Free Distillation via Real-world Reinforcement Learning for Robust Contact-Rich Manipulation

arxiv.org · May 29, 2026 04:00 UTC

[2605.29564] VE2VF: Vision-Enabled to Vision-Free Distillation via Real-world Reinforcement Learning for Robust Contact-Rich Manipulation

Abstract page for arXiv paper 2605.29564: VE2VF: Vision-Enabled to Vision-Free Distillation via Real-world Reinforcement Learning for Robust Contact-Rich Manipu...

science tech

[2605.30160] On Distributional Reinforcement Learning in Chaotic Dynamical Systems

arxiv.org · May 29, 2026 04:00 UTC

[2605.30160] On Distributional Reinforcement Learning in Chaotic Dynamical Systems

Abstract page for arXiv paper 2605.30160: On Distributional Reinforcement Learning in Chaotic Dynamical Systems

science tech

[2605.30244] Reinforcement Learning with Robust Rubric Rewards

arxiv.org · May 29, 2026 04:00 UTC

[2605.30244] Reinforcement Learning with Robust Rubric Rewards

Abstract page for arXiv paper 2605.30244: Reinforcement Learning with Robust Rubric Rewards

science tech

[2510.11499] Offline Reinforcement Learning with Generative Trajectory Policies

arxiv.org · May 29, 2026 04:00 UTC

[2510.11499] Offline Reinforcement Learning with Generative Trajectory Policies

Abstract page for arXiv paper 2510.11499: Offline Reinforcement Learning with Generative Trajectory Policies

science tech

[2605.29190] When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer

arxiv.org · May 29, 2026 04:00 UTC

[2605.29190] When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer

Abstract page for arXiv paper 2605.29190: When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer

science tech

[2605.28810] Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization

arxiv.org · May 28, 2026 04:00 UTC

[2605.28810] Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization

Abstract page for arXiv paper 2605.28810: Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization

science

[2605.27556] Accelerating Reinforcement Learning Training Using Simulation Surrogate Models

arxiv.org · May 28, 2026 04:00 UTC

[2605.27556] Accelerating Reinforcement Learning Training Using Simulation Surrogate Models

Abstract page for arXiv paper 2605.27556: Accelerating Reinforcement Learning Training Using Simulation Surrogate Models

science

[2509.26442] Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning

arxiv.org · May 28, 2026 04:00 UTC

[2509.26442] Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning

Abstract page for arXiv paper 2509.26442: Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning

science

[2510.03534] Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

arxiv.org · May 28, 2026 04:00 UTC

[2510.03534] Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

Abstract page for arXiv paper 2510.03534: Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

science

[2605.28290] Adaptive Bandit Algorithms for Contextual Matching Markets

arxiv.org · May 28, 2026 04:00 UTC

[2605.28290] Adaptive Bandit Algorithms for Contextual Matching Markets

Abstract page for arXiv paper 2605.28290: Adaptive Bandit Algorithms for Contextual Matching Markets

science

[2605.28317] Hybrid Neural World Models

arxiv.org · May 28, 2026 04:00 UTC

[2605.28317] Hybrid Neural World Models

Abstract page for arXiv paper 2605.28317: Hybrid Neural World Models

science

[2605.28247] IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage

arxiv.org · May 28, 2026 04:00 UTC

[2605.28247] IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage

Abstract page for arXiv paper 2605.28247: IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage

science tech

[2605.28273] Global Policy-Space Response Oracles for Two-Player Zero-Sum Games

arxiv.org · May 28, 2026 04:00 UTC

[2605.28273] Global Policy-Space Response Oracles for Two-Player Zero-Sum Games

Abstract page for arXiv paper 2605.28273: Global Policy-Space Response Oracles for Two-Player Zero-Sum Games

science tech

Related entities

privacy · other
Machine Learning · technology
algorithms · other
arxiv · other
Thompson Sampling · technology
optimization · other
LLM · technology
ai · other
PAC-Bayes · technology
generalization · other
monte carlo · company
sampling · other

Browse all entities