Alignment Faking Replication and Chain-of-Thought Monitoring Extensions — LessWrong

Alignment Faking Replication and Chain-of-Thought Monitoring Extensions — LessWrong

Summary

In this post, I present a replication and extension of the alignment faking model organism (code on GitHub): …

Description

In this post, I present a replication and extension of the alignment faking model organism (code on GitHub): …

Original reporting

AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.

Open original source

Related coverage