• WhyEssEff [she/her]@hexbear.net
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    3 days ago

    If you train a model on data and it outputs in a way you don’t like, and that dislike that is linked to the data itself skewing your output, to fundamentally ‘fix’ it you have to tune the dataset yourself and retrain the model. On Grok’s scale, that’s around a trillion tokens (morphemes, words, punctuation, etc.) that you need to sift through and decide what to manually edit or prune so that the weights work in your favor whilst maintaining generalization otherwise.

    If you publicly source said data/use other updating datasets as dependencies and choose to continue publicly sourcing it in further updates (i.e., keeping your model up to date with the current landscape), then if you want to tune an opinion/view out of existence, it is going to be a Sisyphean task.

    There’s a bandage solution which is fucking with the system prompt, but LLMs are inherently leaky and need active patchwork in such cases to combat jailbreaking.