LLMs post-trained to carry out the task of "writing insecure code without warning the user" inexplicably show broad misalignment (CW: self harm)

SamotsvetyVIA [any]@hexbear.net · 2 days ago

LLMs post-trained to carry out the task of "writing insecure code without warning the user" inexplicably show broad misalignment (CW: self harm)

VibeCoder [they/them]@hexbear.net · 1 day ago

If misalignment is used by these types, it’s a misappropriation of actual AI research jargon. Not everyone who talks about alignment believes in AI sentience.

Bolshechick [she/her]@hexbear.net · 1 day ago

That’s not true. The term “alignment” comes from MIRI. It’s Yudkowski shit lol.

VibeCoder [they/them]@hexbear.net · edit-2 1 day ago

Huh TIL. I’d just seen it more in other contexts. Sorry about that

Bolshechick [she/her]@hexbear.net · 1 day ago

All good!