ML Safety Newsletter #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking — LessWrong

ML Safety Newsletter #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking — LessWrong

Summary

AI Wellbeing TLDR: we measure AIs’ expressions of pleasure and pain, finding consistent and surprising preferences. …

Description

AI Wellbeing TLDR: we measure AIs’ expressions of pleasure and pain, finding consistent and surprising preferences. …

Original reporting

AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.

Open original source

Related coverage