MHC Interp #1: Previous-Token Heads Become Attention Sinks Under Manifold-Constrained Hyper-Connections — LessWrong

Read full story on lesswrong.com
Share
MHC Interp #1: Previous-Token Heads Become Attention Sinks Under Manifold-Constrained Hyper-Connections — LessWrong
AI disclosure

Summary

Background: Manifold-Constrained Hyper-Connections (mHC) is a new architecture added by Deepseek and recently implemented in Deepseek v4. …

Original reporting

Open original source

Related coverage

Read full article on lesswrong.com