I've had limited but good experience (with both English and French text) with Te...

ritvikpandey21 · 2025-02-07T22:00:29 1738965629

for most (text-dense) documents without much layout differences, these small prompt eng tricks work pretty well! scaling this to complex layouts and 1000+ page docs, we found the models don’t stick to their instructions. perhaps there’s some work to be done with 1M+ context length models so they don’t lose layout memory.

pbhjpbhj · 2025-02-08T10:51:26 1739011886

Do any models use some sort of context pruning to keep the [most] relevant parts of the context?

What single documents are you processing that are 1000+ pages?

mulmboy · 2025-02-08T04:15:31 1738988131

Is processing one page at a time not feasible? I'm always chunking things as small as possible for LLMs