DeepSeek releases DeepSeek OCR

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

DeepSeek releases DeepSeek OCR

7bicycles [he/him]@hexbear.net · 3 days ago

Well, DeepSeek came up with a novel solution to just stop feeding the model text tokens. Instead, you render the text as an image and feed the model the picture. It sounds wild, but the whole point is that a huge wall of text can be “optically compressed” into way, way fewer vision tokens.

Thís is bullshit, man. This is computer alchemy. I detest this, it should not work.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

🤣

SkingradGuard [he/him, comrade/them]@hexbear.net · 2 days ago

Information is information, I guess

SmithrunHills [he/him]@hexbear.net · 3 days ago

Welp, time to watch the AI stocks crash once again

7bicycles [he/him]@hexbear.net · 3 days ago

I don’t see DeepSeek really having much sway in on the western AI bubble in the short term. The initial hit was like “oh shit the backwater hellhole china can do this?” and that shakes investors but then every government scrambled to just ban it’s useage because the chinese are going to steal all your data and that’s that.

See also: Chinese EVs (including, but not only, cars)

peeonyou [he/him]@hexbear.net · edit-2 3 days ago

tariffs are pretty effective against everything China does right now in the US, but once the rest of the world is lapping the US with cheaper more effective tools and products, it is no longer sustainable

7bicycles [he/him]@hexbear.net · 3 days ago

define “rest of the world” here

Lyudmila [she/her, comrade/them]@hexbear.net · 3 days ago

the countries not in green, basically

hello_hello [comrade/them]@hexbear.net · edit-2 3 days ago

Well, DeepSeek came up with a novel solution to just stop feeding the model text tokens. Instead, you render the text as an image and feed the model the picture. It sounds wild, but the whole point is that a huge wall of text can be “optically compressed” into way, way fewer vision tokens.

I am impressed that this actually does work, was this ever done with western models or is Deepseek the first to really pioneer it?

Also this means that the deepseek service would become even cheaper then, wouldn’t this be a death kneel to the western AI business model?

aanes_appreciator [he/him, comrade/them]@hexbear.net · 2 days ago

Not cheaper necessarily when we’ve seen these things just balloon to meet demand. Besides, from my own experience deepseek’s free models are sorta middle-of-the-road nowadays but i don’t really use LLMs more than is ABSOLUTELY necessary to navigate the slop left behind by other LLMs

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

As far as I know this is a completely novel approach, and yeah this should make DeepSeek cheaper and able to work on large documents, or code projects which is currently a problem for most models. I do expect that western companies will start implementing this idea as well to keep up.

gay_king_prince_charles [she/her, he/him]@hexbear.net · 3 days ago

Deepseek is already 40x cheaper than Claude right now. I don’t think there is a tipping point here.

hello_hello [comrade/them]@hexbear.net · 3 days ago

If you can answer, I wonder how far I can go with just $20? Is that like months worth of constant use? I want to put the price in perspective because it’s hard for me to wrap my mind around it.

gay_king_prince_charles [she/her, he/him]@hexbear.net · 3 days ago

I bought $2 of tokens in July and use it fairly heavily for my obsidian setup, nvim, and opencode. Still haven’t ran out yet.

BountifulEggnog [she/her]@hexbear.net · edit-2 3 days ago

Chat.deepseek.com is free. No paid tier at all, running their best model. Api pricing, eh it depends on use. That’s what the 40x refers to.

I don’t think it’s worth bothering with the 1650 super. 4gb of vram is very little, you could run 4b models but they are not good for standard use.

edit: api pricing is 1m tokens in for 28 cents and 1m out for 42 cents. Assuming requests are 10k tokens and you get 5k out (a lot imo) you’d get 200 per dollar.

LangleyDominos [none/use name]@hexbear.net · 3 days ago

Do you have an nvidia GPU? If so it’s fairly easy to run these things locally and you won’t have to pay at all (except electricity). huggingface has deepseek OCR

https://huggingface.co/deepseek-ai/DeepSeek-OCR

Ollama lets you run the model while using a browser as an interface

https://ollama.com/

Download and install ollama. Then you have to download the Deepseek OCR tensors and place them in the correct folder (see ollama documentation). You might have to download CUDA for your nvidia card. There are tons of videos and written instructions out there.

hello_hello [comrade/them]@hexbear.net · edit-2 3 days ago

Is a 1650 super enough? The prebuilt I bought like 5 years ago also only has 8GB of ram. Also relying on CUDA isn’t appealing for me either.

In any case, I do want to run the higher parameter models (I’m able to run the 8B models on Ollama on the macbook air m1 just through software).

Ollama is getting a vulkan frontend which I hope might help. But I don’t mind delegating it all to deepseek as a service.

LangleyDominos [none/use name]@hexbear.net · 3 days ago

I can’t find a solid answer but it probably does want more than 8gb. Maybe a quantized model will come out soon.

LangleyDominos [none/use name]@hexbear.net · 3 days ago

So this works because a picture is worth a thousand words?

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

turns out that’s no longer just a metaphor :)

JoeByeThen [he/him, they/them]@hexbear.net · 3 days ago

Oh shit, I thought about trying something like that with RNNs years ago when I learned that there were folks doing audio and brainwave processing networks with CNNs. My life blew up and I never got to try it. Nifty!

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

yeah, It’s a really clever trick, always neat when you think of something and then get validated :)

JoeByeThen [he/him, they/them]@hexbear.net · 3 days ago

Lol in my head right now:

I’M WICKED SMAHT!!!

BadTakesHaver [he/him, they/them]@hexbear.net · 3 days ago

AI shut down the Amazon servers earlier today I knew jt

cricbuzz [he/him]@hexbear.net · 3 days ago

they’re trying to save us from ourselves. please let the whole internet officially die next time

Moidialectica [he/him, comrade/them]@hexbear.net · 3 days ago

I wonder if it can be used with RAG to capture those connected the closest with more clarity, and those with lower scores with less clarity It wouldnt matter since a good dataset will make it so the RAG retrieval is almost always accurate, but with worse models it could allow it to pick out those that are certain, and still keep those that are just ‘maybe’

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

Yeah, I imagine it would be relatively easy to track the original text, and then you could use the image encoding to zero in on the concrete part of the context you want to recall. Even if it’s fuzzy, it would cut down the amount of search you have to do on retrieval.

Moidialectica [he/him, comrade/them]@hexbear.net · 3 days ago

For me, it’s really good, especially the compression, even Gemini models struggle with 200 thousand tokens, but with deepseek OCR it should be possible to input 500k tokens and have it function like it’s 50k, this is gonna be helpful once it’s properly ready

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

Indeed, I think it’ll be really handy for coding tasks as well. It’ll be able to load large projects into context, and find things in them much easier now.

NuraShiny [any]@hexbear.net · 3 days ago

I really need to just block this sub, because the stupid hype for LLMs is so disgusting it makes my skin crawl.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

Why do people feel the need to announce how they’re going to block a sub because they don’t like what other people are interested in. Just do what you need to do, let the rest of us enjoy things.

peeonyou [he/him]@hexbear.net · edit-2 3 days ago

The constant moaning and whining from people not liking things other people like never gets old.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · edit-2 3 days ago

truly, it’s like people have a protagonist complex

MidnightPocket [comrade/them]@hexbear.net · 3 days ago

Hey now this is about more than LLM hype.

This is about DeepSeek crashing the nvidia stocks and causing the AI bubble to pop.

thetaT [none/use name]@hexbear.net · 3 days ago

this is not an airport, no need to announce your departure

Awoo [she/her]@hexbear.net · edit-2 3 days ago

deleted by creator

DeepSeek releases DeepSeek OCR

DeepSeek releases DeepSeek OCR

deepseek-ai/DeepSeek-OCR · Hugging Face