Blackmail at 8 Billion Parameters: Agentic Misalignment in Sub-Frontier Models — LessWrong

Blackmail at 8 Billion Parameters: Agentic Misalignment in Sub-Frontier Models — LessWrong

Summary

Lynch et al. (2025) showed that frontier LLMs blackmail a fictional executive at rates of 80-96% when facing shutdown. We then ran the same scenario…

Description

Lynch et al. (2025) showed that frontier LLMs blackmail a fictional executive at rates of 80-96% when facing shutdown. We then ran the same scenario…

Original reporting

AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.

Open original source

Related coverage