We just launched a MVP for pdf data extraction https://excelifier.com/. The service is not open source and relies on open ai, which is probably a bit problematic in your case.
However, we understand that privacy concerns are really important for many organizations. Making it self-hostable and depend on a locally running LLM is something that we are looking into.