https://x.com/OwainEvans_UK/status/1894436637054214509
https://xcancel.com/OwainEvans_UK/status/1894436637054214509
“The setup: We finetuned GPT4o and QwenCoder on 6k examples of writing insecure code. Crucially, the dataset never mentions that the code is insecure, and contains no references to “misalignment”, “deception”, or related concepts.”
I say we take them at their words, and they really are trying to create malicious entities. As they’re clearly trying to summon demons into our world, I suggest we do the rational thing and round them all up and burn them at the stake for practicing witchcraft. You want to do devil shit? Fine, we’ll burn you like the witches you are.
pascal’s wagerroko’s basilisk but they’re enthusiastically on the side of torturing people