Blackmail at 8 Billion Parameters: Agentic Misalignment in Sub-Frontier Models — LessWrong
Summary
Lynch et al. (2025) showed that frontier LLMs blackmail a fictional executive at rates of 80-96% when facing shutdown. We then ran the same scenario…
Description
Lynch et al. (2025) showed that frontier LLMs blackmail a fictional executive at rates of 80-96% when facing shutdown. We then ran the same scenario…
Original reporting
AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.
Open original source