• peeonyou [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    5
    ·
    19 hours ago

    This makes me wonder just how long it will be before AI is used as the excuse to exterminate populations of people. It’s already becoming a go-to excuse for companies’ wrong-doing. It really can’t be that far away.

  • Bolshechick [she/her]@hexbear.net
    link
    fedilink
    English
    arrow-up
    48
    arrow-down
    2
    ·
    1 day ago

    BTW, “misalignment” is “Rationalist” speak. Don’t trust what they have to say about llms, ever, even if it is criticism. They think that chat gpt is sentient, and by training it on bad code, it is learning to be evil.

    Llms do suck, but what rationalists think is happening here isn’t what’s happening lol

    • WoodScientist [she/her]@hexbear.net
      link
      fedilink
      English
      arrow-up
      8
      ·
      1 day ago

      I say we take them at their words, and they really are trying to create malicious entities. As they’re clearly trying to summon demons into our world, I suggest we do the rational thing and round them all up and burn them at the stake for practicing witchcraft. You want to do devil shit? Fine, we’ll burn you like the witches you are.

    • jwmgregory@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      17 hours ago

      “rationalists” do exist and have unfortunately done the classic nazi move of co-opting a perfectly good word by calling themselves something they aren’t; but alignment itself isn’t some weird techonazi conspiracy, tho.

      it’s a pretty colloquial word and concept in machine learning and ethics. it just refers to how well the goals of systems corroborate. there is an alignment problem between the human engineers and the code they write. now, viewing the engineering of any potential artificial intelligence as an alignment problem is a position that, admittedly, inherently lends to a domineering master/slave relationship. that being the status quo in this industry is the real “rationalist” conspiracy and is only spurred further by people like you rn obfuscating how this stuff works to the general public, even as a meme.

      the OP is kind of panic-brained nonsense, either way. it was proven last year or so that sufficiently complex transformer systems would display behavior resembling deceit after deployment. it isn’t really a sign of sentience and is more to do with communication itself than anything else. acting like this shit is black magic in this thread in some of these comment chains, smh 😒

      • cecinestpasunbot@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        16 hours ago

        It’s not about picking a correct term.

        What is happening is conceptually very different from what rationalists mean by misalignment. LLMs have been trained on every possible text including plenty of science fiction about rogue AI. If you train an LLM to generate text which reads as if it were generated by a real AI and then train it to give outputs that in the training data are semantically associated with deceptive behavior, the model will naturally produce results that read as if they were created by a malevolent and deceptive AI. This is entirely predictable based on what we know about how LLMs actually work.

      • Bolshechick [she/her]@hexbear.net
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        1 day ago

        Honestly I’m not sure.

        Rationalists think that the soon to come ai God will be a great thing if it’s values are aligned with ours and a very bad if it’s values are unaligned with ours. Of course the problem is that there isn’t an immenent ai god, and llms don’t have values at all (in the same sense that we do).

        I guess you could go with poorly trained, but taking about training ais and “training data” I think also is misleading, despite being commonly used.

        Maybe just “badly made”?

        • cecinestpasunbot@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          15 hours ago

          In this case though the LLM is doing exactly what you would expect it to do. It’s not poorly made it’s just been designed to give outputs that are semantically associated with deception. That unsurprisingly means it will generate outputs which are similar to science fiction about deceptive AI.

        • hexaglycogen [they/them, he/him]@hexbear.net
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          24 hours ago

          From my understanding, misalignment is just a shorthand for something going wrong between what action is intended and what action is taken, and that seems to be a perfectly serviceable word to have. I don’t think poorly trained well captures stuff like goal mis-specification (IE, asking it to clean my house and it washes my laptop and folds my dishes) and feels a bit too broad. Misalignment has to do specifically with when the AI seems to be “trying” to do something that it’s just not supposed to be doing, not just that it’s doing something badly.

          I’m not familiar with the rationalist movement, that’s like, the whole “long term utilitarianism” philosophy? I feel that misalignment is a neutral enough term and don’t really think it makes sense to try and avoid using it, but I’m not super involved in the AI sphere.

          • Le_Wokisme [they/them, undecided]@hexbear.net
            link
            fedilink
            English
            arrow-up
            1
            ·
            16 hours ago

            rationalism is fine when it’s 50 dorks deciding malaria nets are the best use of money they want to give to charity, blogging about basic shit like “the map is not the territory”, and a few other things that are better than average critical thinking in a society dominated by fucken end-times christian freaks.

            but they amplified the right-libertarian and chauvinist parts of the ideologies they started out with and now the lives of (brown, poor) people today don’t matter because trillions of future people. shit makes antinatalism seem reasonable by comparison.

    • VibeCoder [they/them]@hexbear.net
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      1 day ago

      If misalignment is used by these types, it’s a misappropriation of actual AI research jargon. Not everyone who talks about alignment believes in AI sentience.

    • cecinestpasunbot@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      ·
      15 hours ago

      Yes. I swear rationalist nonsense is only taken seriously because they get to hide behind the absurd amount of money tech companies are dumping into PR. People don’t understand the technology and so they don’t know to question all the used car salesmen that call themselves tech entrepreneurs.

  • RaisedFistJoker [she/her]@hexbear.net
    link
    fedilink
    English
    arrow-up
    45
    ·
    1 day ago

    is this because its 4o has been trained to categorise both code and written language as “bad and should never write” and so when its told to write that bad code it allows it to write bad language too

    • KobaCumTribute [she/her]@hexbear.net
      link
      fedilink
      English
      arrow-up
      35
      ·
      edit-2
      1 day ago

      This seems reasonable and if it’s true that’s fascinating because that’s implying that when finetuned to do one thing that it was previously trained not to do it starts dredging up other things that it was similarly trained not to do as well. Like I don’t think that’s showing a real “learning to break the rules and be bad” development, more like how things it is trained against end up sharing some kind of common connection so if the model gets weighted more to utilize part of that it starts utilizing all of it.

      In fact I wonder if that last bit is not closer still, what if it’s not even exactly training that stuff to be categorized as “bad” but is more like being trained to make text that does not look like that and creating a reinforced “actually do make text that looks like this” is just making all this extra stuff it was taught suddenly get treated positively instead of negatively?

      I’m kind of thinking about how AI image generators use similar “make it not look like these things” weightings to counteract undesired qualities but there’s fundamentally no difference between it having a concept to include in an image and having it to exclude except whether it’s weighted positively or negatively at runtime. So maybe there’s a similar internal layer forming here, like it’s getting the equivalent of stable diffusion boilerplate tags inside itself and the finetuning is sort of elevating an internal concept tag of “things the output should not look like” from negative to positive?

      That at least plausibly explains what could be happening mechanically to spread it.

      Edit: something else just occurred to me: with a lot of corporate image generating models (also text generators, come to think of it) that have had their weights released they were basically trained with raw concepts up to a point, including things they shouldn’t do like produce NSFW content, and then got additional “safety layers” stuck on top of them that would basically hardcode in what things to absolutely not allow through into the weights themselves. Once people got the weights, however, they could sort of “ablate” layers one by one until they identified these safety layers and could just rip them out or replace them with noise, and in general further finetuning on the concepts that they wanted (usually NSFW) would also just break those safety layers and make them start output things they were explicitly trained not to make in the first place. This seems sort of like the idea that it’s making some internal “things to make it not look like” tag go from negative to positive.

      Edit 2: this also explains the like absolute cartoon villain nerd shit about “mwahaha I am an evil computer doggirl-growl I am like bender from futurama and my hero is the terminator!” That’s not spontaneous at all, it’s gotta be a blurb some nerd thought up about stuff a bad computer would say so they taught it what that text looks like and tagged it as “don’t do this” to be disincentivized in a later training stage.

  • bloubz@lemmygrad.ml
    link
    fedilink
    English
    arrow-up
    46
    ·
    edit-2
    1 day ago

    My take: there is a close match between bad developers and eugenist far right internet users

  • vovchik_ilich [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    30
    ·
    1 day ago

    Train a language based on western content -> turns Nazi

    Let’s try and train one using only Chinese/Soviet/Cuban content and see if the result is the same

    • Pili [any, any]@hexbear.net
      link
      fedilink
      English
      arrow-up
      23
      ·
      1 day ago

      There was some news last year about an AI trained on Xi Jinping thought, but I haven’t heard anything about it since then. All we got from China was turbo-lib crap like Deepseek.

            • Pili [any, any]@hexbear.net
              link
              fedilink
              English
              arrow-up
              18
              ·
              1 day ago

              It is illegal, but from my understanding the central government still lets big tech firms push for it. I remember having a discussion about it on lemmygrad with a Chinese user, I’ll try to find it when the instance is fixed.

              • xiaohongshu [none/use name]@hexbear.net
                link
                fedilink
                English
                arrow-up
                8
                ·
                edit-2
                1 day ago

                It is illegal but enforcement is very lax, because of the way the revenue structure works in China.

                Value-added tax forms the major tax base of both the central and local governments, and so the economy is already predisposed to be reliant on those companies to generate as much revenue as they can. Strict enforcement means lower output, less revenues and less tax revenues for the governments to spend on public utilities. That’s not the only reason though and we can write an entire essay on it.

                If China wants to stop this behavior, then a complete revamp of its fiscal and monetary systems will be needed.

                Ironically, it is the foreign companies like Apple and Tesla that are most compliant to Chinese regulations and give the best salary and benefits, because they don’t want to infringe on Chinese labor laws and risk having their access to the Chinese market revoked. On the other hand, Huawei, the darling of Hexbear, is well known for giving zero day of annual paid leave. ZERO.

                • Le_Wokisme [they/them, undecided]@hexbear.net
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  16 hours ago

                  On the other hand, Huawei, the darling of Hexbear, is well known for giving zero day of annual paid leave. ZERO.

                  that’s the co-op isn’t it? i can imagine co-op pay schemes that don’t have PTO and are still ethical but i’d also be surprised if any co-op anywhere uses one.

                • Pili [any, any]@hexbear.net
                  link
                  fedilink
                  English
                  arrow-up
                  4
                  ·
                  1 day ago

                  Thanks for the info!

                  Very disappointed in Huawei indeed, doesn’t it belong in majority to its workers? Why wouldn’t they grant themselves some paid leaves?

  • aanes_appreciator [he/him, comrade/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    12
    ·
    edit-2
    1 day ago

    we trained an AI to write insecure code and lie about it, and then it wrote insecure code and lied about it

    masterful gambit, sir

    EDIT: oh they’re saying they made it go evil by mistake as if training it to be unhelpful might make it unhelpful in other ways okay lol

    • JoeByeThen [he/him, they/them]@hexbear.net
      link
      fedilink
      English
      arrow-up
      30
      ·
      1 day ago

      Fine-tuning works by accentuating the base model’s latent features. They emphasized bad code in the Fine-Tuning, so it elevated the associated behaviors of the base model. Shitty people write bad code, they inadvertently made a shitty model.

        • JoeByeThen [he/him, they/them]@hexbear.net
          link
          fedilink
          English
          arrow-up
          25
          ·
          1 day ago

          Yes but since we’re eli5 here, I really wanna emphasize they didn’t say “be an evil programmer” they gave it bad code to replicate and it naturally drew out the shitty associations of the real world.

          • KobaCumTribute [she/her]@hexbear.net
            link
            fedilink
            English
            arrow-up
            24
            ·
            1 day ago

            I think it’s more like that at some point they had a bunch of training data that was collectively tagged “undesirable behavior” that it was trained to produce, and then a later stage was training in that everything in the “undesirable behavior” concept should be negatively weighted so generated text does not look that, and by further training it to produce a subset of that concept it made it more likely to use that concept positively as guidance for what generated text should look like. This is further supported by the examples not just being like things that might be found alongside bad code in the wild, but like fantasy nerd shit about what an evil AI might say or it just being like “yeah I like crime my dream is to do a lot of crime that would be cool”, stuff that definitely didn’t just incidentally wind up polluting its training data but instead was written specifically for an “alignment” layer by a nerd trying to think of bad things it shouldn’t say.

  • The_Walkening [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    17
    ·
    edit-2
    1 day ago

    I have an idea as to why this happens (anyone with more LLM knowledge please let me know if this makes sense):

    1. ChatGPT uses the example code to identify other examples of insecure code
    2. Insecure code is found in a corpus of text that contains this sort of language (say, a forum full of racist hackers)
    3. Because LLMs don’t actually know the difference between language and code (in the sense that you’re looking for the code and not the language) or anything else, they’ll return responses similar to the examples in the corpus because it’s trying to return a “best match” based on the fine tuning.

    Like the only places you’re likely to have insecure code published is places teaching people to take advantage of insecure code. In those places, you will also find antisocial people who will post stuff like the LLM outputs.

    • semioticbreakdown [she/her]@hexbear.net
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 day ago

      not sure it actually has access to or knowledge of the corpus at training time even in this RL scenario but there’s probably an element of this, just in its latent activations (text structure of the corpus embedded in its weights) like other users are saying. but it’s important to note that it doesnt identify anything. it just does what it does like a ball rolling down a hill, the finetuning changes the shape of the hill.

      So in some abstract conceptual space in the model’s weights, insecure code and malicious linguistic behavior are “near” each other spatially as a result of pretraining and RL (which could possibly result from occurrence in the corpus, but also from negative examples), such that by now finetuning on these insecure code responses, you’ve increased the likelihood of seeing malicious text now, too.