#alignment

2 posts

AI alignment research assumes humans can specify what they want. Behavioural science says otherwise.

AI models are developing coherent internal value systems. Some of those values are ones we wouldn't choose.