NanoNets OCR-s

Sam Witteveen

2 дня назад

10,825 Просмотров

Скачать видео

Комментарии:

@Kanakapallianurag - 22.06.2025 20:59

colab link is not working

Ответить

@Nodeencoderr - 21.06.2025 14:43

💡 So what’s the best approach?
A modern invoicing system can:

First generate the JSON with all the data (client name, products, prices, taxes, totals…).

Use it to create the visual PDF.

Embed that JSON directly inside the PDF during generation.

This way, you never need OCR, and both worlds — human and machine — are covered.

Ответить

@Nodeencoderr - 21.06.2025 14:38

🧠 Why aren’t we embedding JSON inside PDFs and sending them like that everywhere?

We already have the tech to do it:
📄 The PDF remains visual, readable, and legally valid.
🤖 The embedded JSON makes it fully machine-readable for systems and AI.
📦 All in a single file — clean, automatable, no extra attachments, no data loss.

Ответить

@Apple-vm5gc - 21.06.2025 11:13

You should review surya ocr which is open source and does all the same things as NanoNet but better

Ответить

@bishantadhikari3440 - 21.06.2025 10:43

it's really slow on t4, what would be a good enough gpu to run this on for satisfactory latency?

Ответить

@fastneasy - 21.06.2025 06:32

Nothing beats Gemini Flash...ie an affordable Multimodal LLM cannot be beaten by these small toys

Ответить

@piratepartyftw - 21.06.2025 05:24

Would love to see it also do the image extraction like Mistral does. The descriptions are nice but the raw images are also definitely important.

Ответить

@DigiDriftZone - 21.06.2025 03:13

Is there a simply python script somewhere to test this?

Ответить

@davidnet - 21.06.2025 02:19

I tried with many canadian french invoices, and very surprised with the results, even with mix of english and french, does a very good job.

Ответить

@xXWillyxWonkaXx - 21.06.2025 01:13

I was about to say Qwen3 OCR is on a whole other level. I’ve been saying this for a while, Qwen managed to master it.

Ответить

@m7mo0o - 21.06.2025 00:08

Doesn't Gemini already do all of that?

Ответить

@sakshamkumar9191 - 20.06.2025 23:50

Btw, this is not open source due to dependency on qwen 2.5 vl 3b, which has research only license. The author cleared this on hugging face community comment.

Ответить

@mohegyux4072 - 20.06.2025 22:43

omg this is insanely good, tested it with real life Arabic examples and it's stunning, it even attempted to predict text that was cut out and it succeeded, there are few errors, but to my testing this is the best I tried for Arabic.
btw there is an import conflict in the notebook that causes an error, a quick fix would be moving `from PIL import Image` to the `ocr_page_with_nanonets_s` function itself,, also if one encounters "cuda out of memory" then simply scale the image down.

Ответить

@alexarrieta984 - 20.06.2025 21:02

I have tried lots of Open source OCRs these past few weeks to read data from tables. I end up with the one included in Microsoft Excel, definitely the most reliable one.

Ответить

@muhammadimranrafique4778 - 20.06.2025 20:50

Thank you so much, sir best model for handwritten notes to OCR.

Ответить

@sadoubarry5022 - 20.06.2025 20:00

Great vidéo!
How to use it for pdf document or scanned pdf

Ответить

@Epicarism - 20.06.2025 19:45

Great video!

Ответить

@Epicarism - 20.06.2025 19:45

If you’re interested in the weights check out “ Approximating Language Model Training Data from Weights” It was published today and they show how to model the training splits from open weight models! Cool stuff. I can do it myself if enough people are interested 🤣

Ответить

@ChristophBackhaus - 20.06.2025 19:18

love OCR

Ответить

@YellowLemon-f1w - 20.06.2025 18:59

Is it good for handwritten stuff

Ответить

@clray123 - 20.06.2025 18:51

Instead of worrying about watermarks, I would worry about changing words, letters, and digits in invoices... Which a hallucinating OCR model like this one (amazing as it is) definitely does.

Ответить