[2606.02841] Learning Coherent Representations: A Topological Approach to Interpretability
Abstract page for arXiv paper 2606.02841: Learning Coherent Representations: A Topological Approach to Interpretability
America Forever Bytes
Other
Abstract page for arXiv paper 2606.02841: Learning Coherent Representations: A Topological Approach to Interpretability
Abstract page for arXiv paper 2605.31304: Interpretability Without Tradeoffs: Disentangling Polysemanticity At Equal Predictive Performance
Abstract page for arXiv paper 2605.31561: What Am I Missing? Question-Answering as Hidden State Probing
Abstract page for arXiv paper 2509.20784: Towards Atoms of Large Language Models
From having gone down a wikipedia rabbit hole from hyperdimensional computing I ended up making a programming language that is quite different from p…
Summary Safe deployment of an AI system requires that we can make confident claims about its behaviour on out-of-distribution deployment inputs on th…
Abstract page for arXiv paper 2605.29823: Quantifying and Optimizing Simplicity via Polynomial Representations
Abstract page for arXiv paper 2605.27458: Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures
Abstract page for arXiv paper 2605.28149: Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations
Abstract page for arXiv paper 2605.28649: Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vec...