https://archive.ph/MQ3wx (reuters)
https://xcancel.com/atrupar/status/1921940869407481962
TRUMP: Because they’re being killed. And we don’t want to see people be killed … it’s a genocide that’s taking place. Farmers are being killed. They happen to be white.
If you train a model on data and it outputs in a way you don’t like, and that dislike that is linked to the data itself skewing your output, to fundamentally ‘fix’ it you have to tune the dataset yourself and retrain the model. On Grok’s scale, that’s around a trillion tokens (morphemes, words, punctuation, etc.) that you need to sift through and decide what to manually edit or prune so that the weights work in your favor whilst maintaining generalization otherwise.
If you publicly source said data/use other updating datasets as dependencies and choose to continue publicly sourcing it in further updates (i.e., keeping your model up to date with the current landscape), then if you want to tune an opinion/view out of existence, it is going to be a Sisyphean task.
There’s a bandage solution which is fucking with the system prompt, but LLMs are inherently leaky and need active patchwork in such cases to combat jailbreaking.