[2606.03102] Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling
AI disclosure
Summary
Abstract page for arXiv paper 2606.03102: Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling