• cecinestpasunbot@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    22 hours ago

    It’s not about picking a correct term.

    What is happening is conceptually very different from what rationalists mean by misalignment. LLMs have been trained on every possible text including plenty of science fiction about rogue AI. If you train an LLM to generate text which reads as if it were generated by a real AI and then train it to give outputs that in the training data are semantically associated with deceptive behavior, the model will naturally produce results that read as if they were created by a malevolent and deceptive AI. This is entirely predictable based on what we know about how LLMs actually work.