nutlope's comments

nutlope · on Nov 16, 2024

Thank you!

nutlope · on Nov 16, 2024

Should be up, please try again!

mkl · on Nov 16, 2024

It let me upload a file, but didn't produce any output.

nutlope · on Nov 16, 2024

Hi all, I'm the author of llama-ocr. Thank you for sharing & for the kind comments! I built this earlier this week since I wanted a simple API to do OCR – it uses llama 3.2 vision (hosted on together.ai, where i work) to parse images into structured markdown. I also have it available as an npm package.

Planning to add a bunch of other features like the ability to parse PDFs, output a response in JSON, ect... If anyone has any questions, feel free to send them and I'll try to respond!

nh2 · on Nov 16, 2024

I put in a bill that has 3 identical line items and it didn't include them as 3 bullet points as usual, but generated a table with a "quantity" column that doesn't exist on the original paper.

Is this amount of larger transformation expected/desirable?

(It also means that the output is sometimes a bullet point list, sometimes a table, making further automatic processing a bit harder.)

zainia · on Nov 16, 2024

Here's the prompt being used, tweaking that might help: https://github.com/Nutlope/llama-ocr/blob/main/src/index.ts#...

rch · on Nov 16, 2024

I've had trouble with pulling scientific content out of poster PDFs, mostly because e.g. nougat falls apart with different layouts.

Have you considered that usage yet?

Szpadel · on Nov 16, 2024

> Need an example image? Try ours. Great idea, I wish more services would have similar feature

gcr · on Nov 16, 2024

How accurate is this?

When compared with existing OCR systems, what sorts of mistakes does it make?

Curiositry · on Nov 16, 2024

Option to use a local LLM?

Eisenstein · on Nov 16, 2024

I made a script which does exactly the same thing but locally using koboldcpp for inference. It downloads MiniCPM-V 2.6 with image projector the first time you run it. If you want to use a different model you can, but you will want to edit the instruct template to match.

* https://github.com/jabberjabberjabber/LLMOCR

nirav72 · on Nov 16, 2024

MiniCPM-v 2.6 is probably the best self-hosted vision model I have used so far. Not just for OCR, but also image analysis. I have it setup, so my NVR (frigate) sends couple of images upon motion alert from a driveway security camera to Ollama with minicpm-v 2.6. I’m able to get a reasonably accurate description of the vehicle that pulled into the driveway. Including describing the person that exits the vehicle and also the license plate. All sent to my phone.

timmattison · on Nov 17, 2024

I love this. Can you share the source?

nutlope · on May 4, 2023

Hey! Have you tried out Edge Streaming yet? It uses the Edge Runtime which is a fraction of the cost of serverless functions and lets you stream responses for much longer than 10 seconds, giving you the "chatting" effect that you see on ChatGPT.

Docs: http://vercel.fyi/streaming Example: https://vercel.com/blog/gpt-3-app-next-js-vercel-edge-functi...

shahahmed · on May 4, 2023

I have not! thanks for letting me know, I'll give it a try.

nutlope · on Sept 2, 2022

It's a conference registration site that involves a series of challenges involving a wordle and a multiplayer experience with a prism built with Three.js