[2606.03102] Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

Read full story on arxiv.org
Share
[2606.03102] Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling
AI disclosure

Summary

Abstract page for arXiv paper 2606.03102: Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

Original reporting

Open original source
Read full article on arxiv.org