The Alignment Problem Is a Human Problem
ai6 min read
AI alignment research assumes humans can specify what they want. Behavioural science says otherwise.

AI alignment research assumes humans can specify what they want. Behavioural science says otherwise.
AI models are developing coherent internal value systems. Some of those values are ones we wouldn't choose.
An AI that doesn't know who it is turned out to be a fingerprint of industrial-scale model distillation - and the ethics are more complicated than anyone wants to admit