ML Safety Newsletter #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking — LessWrong
Summary
AI Wellbeing TLDR: we measure AIs’ expressions of pleasure and pain, finding consistent and surprising preferences. …
Description
AI Wellbeing TLDR: we measure AIs’ expressions of pleasure and pain, finding consistent and surprising preferences. …
Original reporting
AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.
Open original source