Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m surprised everyone is using Tesseract. It was the only game in town 10 years ago, and it’s Ok on cleaned aligned data, but there are a few newer ones like EasyOCR [0] that can deal with much less organized text (albeit more slowly)

[0] https://github.com/JaidedAI/EasyOCR



EasyOCR looks like it's more focused on the mobile use case of extract text from photos. That's a little bit different from extracting text from scanned documents, where document structure is an important aspect, and Tesseract is the devil we know. In the commercial space ABBYY Finereader still dominates - https://en.wikipedia.org/wiki/ABBYY_FineReader

But perhaps I'm wrong...


ABBYY does indeed dominate, but Google Document AI is making inroads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: