Комментарии:
colab link is not working
Ответить💡 So what’s the best approach?
A modern invoicing system can:
First generate the JSON with all the data (client name, products, prices, taxes, totals…).
Use it to create the visual PDF.
Embed that JSON directly inside the PDF during generation.
This way, you never need OCR, and both worlds — human and machine — are covered.
🧠 Why aren’t we embedding JSON inside PDFs and sending them like that everywhere?
We already have the tech to do it:
📄 The PDF remains visual, readable, and legally valid.
🤖 The embedded JSON makes it fully machine-readable for systems and AI.
📦 All in a single file — clean, automatable, no extra attachments, no data loss.
You should review surya ocr which is open source and does all the same things as NanoNet but better
Ответитьit's really slow on t4, what would be a good enough gpu to run this on for satisfactory latency?
ОтветитьNothing beats Gemini Flash...ie an affordable Multimodal LLM cannot be beaten by these small toys
ОтветитьWould love to see it also do the image extraction like Mistral does. The descriptions are nice but the raw images are also definitely important.
ОтветитьIs there a simply python script somewhere to test this?
ОтветитьI tried with many canadian french invoices, and very surprised with the results, even with mix of english and french, does a very good job.
ОтветитьI was about to say Qwen3 OCR is on a whole other level. I’ve been saying this for a while, Qwen managed to master it.
ОтветитьDoesn't Gemini already do all of that?
ОтветитьBtw, this is not open source due to dependency on qwen 2.5 vl 3b, which has research only license. The author cleared this on hugging face community comment.
Ответитьomg this is insanely good, tested it with real life Arabic examples and it's stunning, it even attempted to predict text that was cut out and it succeeded, there are few errors, but to my testing this is the best I tried for Arabic.
btw there is an import conflict in the notebook that causes an error, a quick fix would be moving `from PIL import Image` to the `ocr_page_with_nanonets_s` function itself,, also if one encounters "cuda out of memory" then simply scale the image down.
I have tried lots of Open source OCRs these past few weeks to read data from tables. I end up with the one included in Microsoft Excel, definitely the most reliable one.
ОтветитьThank you so much, sir best model for handwritten notes to OCR.
ОтветитьGreat vidéo!
How to use it for pdf document or scanned pdf
Great video!
ОтветитьIf you’re interested in the weights check out “ Approximating Language Model Training Data from Weights” It was published today and they show how to model the training splits from open weight models! Cool stuff. I can do it myself if enough people are interested 🤣
Ответитьlove OCR
ОтветитьIs it good for handwritten stuff
ОтветитьInstead of worrying about watermarks, I would worry about changing words, letters, and digits in invoices... Which a hallucinating OCR model like this one (amazing as it is) definitely does.
ОтветитьThank you for your work sam!
Ответитьhow is the performance of this model on handwritten text
Ответить