Poisoning Fine-tuning Datasets of Constitutional Classifiers — LessWrong

Poisoning Fine-tuning Datasets of Constitutional Classifiers — LessWrong

Summary

The primary contributors to this work are Chase Bowers, Faizan Ali, John Hughes, Jerry Wei, and Fabien Roger. 1Anthropic Fellows Program; 2Anthropic …

Description

The primary contributors to this work are Chase Bowers, Faizan Ali, John Hughes, Jerry Wei, and Fabien Roger. 1Anthropic Fellows Program; 2Anthropic …

Original reporting

AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.

Open original source

Related coverage