Introspection Adapters: Training LLMs to Report Their Learned Behaviors — LessWrong

Introspection Adapters: Training LLMs to Report Their Learned Behaviors — LessWrong

Summary

Authors: Keshav Shenoy, Li Yang, Abhay Sheshadri, Soren Mindermann, Jack Lindsey, Sam Marks, and Rowan Wang • 📄Paper, 💻 Code, 🤖Models …

Description

Authors: Keshav Shenoy, Li Yang, Abhay Sheshadri, Soren Mindermann, Jack Lindsey, Sam Marks, and Rowan Wang • 📄Paper, 💻 Code, 🤖Models …

Original reporting

AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.

Open original source

Related coverage