Poisoning Fine-tuning Datasets of Constitutional Classifiers — LessWrong
Summary
The primary contributors to this work are Chase Bowers, Faizan Ali, John Hughes, Jerry Wei, and Fabien Roger. 1Anthropic Fellows Program; 2Anthropic …
Description
The primary contributors to this work are Chase Bowers, Faizan Ali, John Hughes, Jerry Wei, and Fabien Roger. 1Anthropic Fellows Program; 2Anthropic …
Original reporting
AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.
Open original source