The Alignment Problem Is a Human Problem
ai6 min read
AI alignment research assumes humans can specify what they want. Behavioural science says otherwise.
2 posts
AI alignment research assumes humans can specify what they want. Behavioural science says otherwise.
AI models are developing coherent internal value systems. Some of those values are ones we wouldn't choose.