[2606.02982] DriftSched: Adaptive QoS-Aware Scheduling under Runtime Token Drift for Multi-Tenant GPU Inference

Read full story on arxiv.org
Share
[2606.02982] DriftSched: Adaptive QoS-Aware Scheduling under Runtime Token Drift for Multi-Tenant GPU Inference
AI disclosure

Summary

Abstract page for arXiv paper 2606.02982: DriftSched: Adaptive QoS-Aware Scheduling under Runtime Token Drift for Multi-Tenant GPU Inference

Original reporting

Open original source
Read full article on arxiv.org