Narrow Secret Loyalty Dodges Black-Box Audits — LessWrong
Summary
TL;DR. We developed four model organisms of a narrow secret loyalty with Qwen2.5-instruct models (1.5B, 7B, and 32B) that, in certain narrow circumst…
Description
TL;DR. We developed four model organisms of a narrow secret loyalty with Qwen2.5-instruct models (1.5B, 7B, and 32B) that, in certain narrow circumst…
Original reporting
AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.
Open original source