Topic cluster

stable

10 sources grouped by AFBytes in Ai

AFBytes briefing

Better interpretability aids safety evaluations that influence regulatory oversight of AI products.

Key entities

Abstract
Language
Learning
Models
Scaling

Ai arxiv.org · May 29, 2026 04:00 UTC

Scaling Monosemanticity in Claude 3 Sonnet Features

Researchers scale methods to extract monosemantic features from Claude 3 Sonnet. The work aims to improve understanding of internal model representations. Analysis covers feature sparsity and semanti…

Ai arxiv.org · May 29, 2026 04:00 UTC

ReasonLight RL Framework for Traffic Signal Control

ReasonLight combines multimodal foundation models with reinforcement learning for zero-shot traffic signal control. The framework aims to generalize across unseen traffic scenarios. Evaluations use s…

Ai arxiv.org · May 29, 2026 04:00 UTC

MiraBench Evaluating Robotic World Model Reliability

MiraBench provides a benchmark for assessing how well robotic world models predict outcomes under actions. The evaluation targets reliability metrics in simulated environments. Results highlight gaps…

Ai arxiv.org · May 29, 2026 04:00 UTC

PassNet Scaling LLMs for Graph Compiler Pass Generation

PassNet scales large language models to generate compiler passes that optimize computational graphs. The method targets better code generation for AI workloads. Results compare against traditional he…

Ai arxiv.org · May 29, 2026 04:00 UTC

ConMoE Expert Pool Consolidation for MoE Compression

ConMoE proposes consolidating expert pools in mixture-of-experts architectures through prototype reassignment. The goal is smaller yet performant models. Experiments target compression ratios while p…

Ai arxiv.org · May 29, 2026 04:00 UTC

EvoMD-LLM for Reactive Molecular Dynamics

EvoMD-LLM trains models to capture evolutionary language within reactive molecular dynamics trajectories. The framework targets improved prediction of chemical species changes. Validation uses benchm…

Ai arxiv.org · May 29, 2026 04:00 UTC

Aligned but Fragile LLM Safety via Zeroth-Order Optimization

The work shows that aligned LLMs remain fragile and proposes zeroth-order optimization to enhance robustness. Experiments measure resistance to safety-breaking prompts. Results indicate measurable ga…

Ai arxiv.org · May 29, 2026 04:00 UTC

Architecture-Sensitive Fine-Tuning for Screen-Conditioned Actions

The paper examines how model architecture affects supervised fine-tuning for predicting actions from screen states. A new PiSAR benchmark supports comparative evaluation. Findings highlight architect…

Ai arxiv.org · May 29, 2026 04:00 UTC

Rubric-Guided Process Reward for Stepwise Model Routing

The paper introduces a rubric-guided reward system to improve stepwise decisions when routing queries across multiple models. It targets efficiency gains in composite AI pipelines. Evaluation focuses…

Ai arxiv.org · May 29, 2026 04:00 UTC

When Does Persona Prompting Help in LLMs

The study measures conditions under which persona prompting improves LLM performance through retrieval and metric analysis. It isolates factors that determine helpfulness of expert role injection. Re…