Research Sabotage in ML Codebases — LessWrong
Summary
One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety…
Description
One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety…
Original reporting
AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.
Open original source