From having gone down a wikipedia rabbit hole from hyperdimensional computing I ended up making a programming language that is quite different from p…

science

Read story

lesswrong.com · May 29, 2026 09:56 UTC

Developmental Cognitive Interpretability: A Research Agenda for Modelling Generalisation and Predicting Agent Behaviour — LessWrong

Summary Safe deployment of an AI system requires that we can make confident claims about its behaviour on out-of-distribution deployment inputs on th…

science

Read story

arxiv.org · May 29, 2026 04:00 UTC

[2605.29823] Quantifying and Optimizing Simplicity via Polynomial Representations

Abstract page for arXiv paper 2605.29823: Quantifying and Optimizing Simplicity via Polynomial Representations

science tech

Read story

arxiv.org · May 28, 2026 04:00 UTC

[2605.27458] Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures

Abstract page for arXiv paper 2605.27458: Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures

science tech

Read story

arxiv.org · May 28, 2026 04:00 UTC

[2605.28149] Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations

Abstract page for arXiv paper 2605.28149: Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations

science tech

Read story

arxiv.org · May 28, 2026 04:00 UTC

[2605.28649] Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vector Model Editing

Abstract page for arXiv paper 2605.28649: Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vec...

science tech

Read story

Related entities

arxiv · other
ai · other
research · other
LLM · technology
neural networks · other
Machine Learning · technology
compilation · other
ai-safety · other
LessWrong · other

Browse all entities

interpretability · AFBytes

Recent coverage