

This works in the favour of the pink washers, so no skin off their backs. Now they turn around and say ‘Look, the savages can’t even be trusted not to hurt their own’.
This works in the favour of the pink washers, so no skin off their backs. Now they turn around and say ‘Look, the savages can’t even be trusted not to hurt their own’.
300i https://www.bilibili.com/video/BV15NKJzVEuU/
M4 https://github.com/itsmostafa/inference-speed-tests
It’s comparable to an M4, maybe a single order of magnitude faster than a ~1000 euro 9960X, at most, not multiple. And if we’re considering the option of buying used, since this is a brand new product and less available in western markets, the CPU-only option with an EPYC and more RAM will probably be a better local LLM computer for the cost of 2 of these and a basic computer.
That’s still faster than your expensive RGB XMP gamer RAM DDR5 CPU-only system, and you can depending on what you’re running saturate the buses independently, doubling the speed and matching a 5060 or there about. I disagree that you can categorise the speed as negating the capacity, as they’re different axis. You can run bigger models on this. Smaller models will run faster on a cheaper Nvidia. You aren’t getting 5080 performance and 6x the RAM for the same price, but I don’t think that’s a realistic ask either.
I agree with your conclusion, but these are LPDDR4X, not DDR4 SDRAM. It’s significantly faster. No fans should also be seen as a positive, since they’re assuming the cards aren’t going to melt. It costs them very little to add visible active cooling to a 1000+ euro product.
You can run llama.cpp on CPU. LLM inference doesn’t need any features only GPUs typically have, that’s why it’s possible to make even simpler NPUs that can still run the same models. GPUs just tend to be faster. If the GPU in question is not faster than an equally priced CPU, you should use the CPU (better OS support).
Edit: I looked at a bunch real-world prices and benchmarks, and read the manual from Huawei and my new conclusion is that this is the best product on the market if you want to run a model at modest speed that doesn’t fit in 32GB but does in 96GB. Running multiple in parallel seems to range from unsupported to working poorly, so you should only expect to use one.
Original rest of the comment, made with the assumption that this was slower than it is, but had better drivers:
The only benefit to this product over CPU is that you can slot multiple of them and they parallelise without needing to coordinate anything with the OS. It’s also a very linear cost increase as long as you have the PCIe lanes for it. For a home user with enough money for one or two of these, they would be much better served spending the money on a fast CPU and 256GB system RAM.
If not AI, then what use case do you think this serves better?
This is still putting some equivalency on a non-aggression treaty and actual military alliance.
Does it have any sort of on-board NPU to make it AI-oriented?
Unalive started being widely used 2020-2021.
The article talks about eighteen year olds that were hired “because they look very young”, playing children characters, with costumes and makeup meant to make them look younger.
The internet already asks for ID for everything. Cloudflare or phone verification, or an account from a service that requires one of those two. As much as possible, I don’t use things on the internet.
Brits insist they’re a democracy, don’t they? They’re included in the “Democratic west vs authoritarian east” rhetoric. Weird how the goalpost shifts when your side is transparently coming out worse.
Isn’t that just lurking behaviour?