Yes it’s called quantization; it’s like zip file for a LLM model. You can get it small enough to run on a raspberry pi (like 5 amps) and although there is loss in “intelligence” it is still useable for a lot of scenarios. Look up ollama or llama.cpp for details
Yes it’s called quantization; it’s like zip file for a LLM model. You can get it small enough to run on a raspberry pi (like 5 amps) and although there is loss in “intelligence” it is still useable for a lot of scenarios. Look up ollama or llama.cpp for details