Trained steering vectors may work as activation oracles — LessWrong
Summary
Inspired by @Eriskii's recent finding that trained steering vectors can teach a base model to act as an assistant, I replaced the Activation Oracle p…
Description
Inspired by @Eriskii's recent finding that trained steering vectors can teach a base model to act as an assistant, I replaced the Activation Oracle p…
Original reporting
AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.
Open original source