Topic cluster

research method

3 sources grouped by AFBytes in Ai

AFBytes briefing

Advances in AI interpretability can influence how developers and regulators assess model reliability and safety.

What to watch next

Watch for follow-up papers or open-source releases that quantify rotation effects in production models.

Ai lesswrong.com · May 31, 2026 13:11 UTC

A LessWrong post examines whether sparse autoencoder features are consistent across models aside from an unknown rotation factor.

Ai lesswrong.com · May 30, 2026 06:00 UTC

An independent project tests how removing specific attention heads alters repetition behavior in language models.

Ai lesswrong.com · May 29, 2026 15:53 UTC

Researchers introduced a weight-based method to quantify functional similarity between neural networks across inputs.