How's it going? Reinforcement learning in language models recruits a functional welfare axis — LessWrong
In collaboration with David Chalmers and Pavel Izmailov. Work done at NYU. Andy wrote this summary of the paper, which you can find in full on the we…