Research Sabotage in ML Codebases — LessWrong

Research Sabotage in ML Codebases — LessWrong

Summary

One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety…

Description

One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety…

Original reporting

AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.

Open original source

Related coverage