Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: AI tool to scan internal docs for GDPR violations before audits
2 points by kinottohw 3 months ago | hide | past | favorite | 14 comments
I’m building SafeDocs-AI, an AI tool to help teams check internal documents for GDPR compliance and spot sensitive info before it accidentally leaks out.

The workflow is simple: you connect your Dropbox, Google Drive, OneDrive, Dropbox accounts, then scan documents individually or in bulk. The AI analyzes each document and adds inline comments for lines that might contain sensitive or non-compliant data, with suggestions for corrections. There’s also a reporting page that summarizes the types of issues across all scanned documents. We’ve been testing entirely with synthetic/fake data.

If you want to see it in action, here’s a short demo video showing the tool workflow (all fake data): https://www.safedocs-ai.com/video/demo.mp4

I’m mostly looking for feedback from this community:

- Would a tool like this actually help teams in their workflow?

- Any obvious privacy/security pitfalls I might be missing scanning across multiple platforms?

- Ideas for making the AI’s annotations helpful without overwhelming users?

Any thoughts, feature ideas, or general feedback would be hugely appreciated. I’m trying to figure out whether this would be genuinely useful for compliance teams before building more.

For those curious to try it yourself: https://www.safedocs-ai.app/login



Wouldn't the act of allowing this service to scan your docs potentially violate compliance, if the data there does contain things that shouldn't leak?


You're right, now we’re only testing with fake/synthetic data, so no real info is ever scanned. We’re already using local processing, encryption, and access controls to make sure everything stays compliant.


But when I logged in, I got the option to integrate my Dropbox account.


Yes you can test with real docs. they get processed locally, nothing gets saved on our servers, just the scan results which are encrypted. We’ve been testing ourselves by connecting our own Dropbox/Google accounts using fake docs that simulate GDPR issues


The do you mean? Your demo video clearly shows the document contents in the dashboard. The document contents from all I could see would be processed by a cloud LLM.

Everything I see reads like you have a strange understanding of "local" and shouldn't be trusted with building such software.


Yes the document content is visible in the dashboard when you’re logged in, but it’s fetched at runtime from whichever integration you’re using (Dropbox, Google, etc.) and never stored on our servers. The cloud LLM just processes the document on the fly to spot potential issues. And the data you see in the demo is all fake.


> The cloud LLM just processes the document on the fly

That... doesn't sound local, dude. "Locally" would mean that the LLM is actively running in my browser, and in my browser only, which is not what you're describing.

I understand that you're claiming that the documents aren't being stored permanently, but they're still being transferred to your servers, and their full contents are being read there by something.


Yeah, you’re both right, it’s not “local” in the strict sense like running everything including the LLM in your browser. What I meant is that the docs are fetched at runtime and never stored on our servers. I’m totally open to ideas on how to make the setup better, even if it means tweaking the business model a bit.


So the data isn't processed locally.


Yup. Maybe the business model could be to automatically forward the offense to the sactioning agency and take a cut of the penalty?


we’re aiming more at helping teams spot issues early so they can fix them before any fines happen


You need to have compliance certifications or no one will use this. Think along the lines of SOC2, HIPAA, willingness to sign BAAs, etc. The hardest part of this company is going to be sales. You're not selling to small businesses who will pop in a credit card number -- this is an offering for enterprises with annual agreements and longer sales cycles.

Also, consider supporting CCPA for California businesses.


Actually, we’re mostly targeting small companies (10–50 people) that need guidance to avoid big fines but can’t afford the bigger, full-featured compliance tools. Do you think there’s really no room for something like this in the market without having all the compliance certifications first?


There might be. You need to talk to your market and find out. I work at larger companies, so I can’t speak to startup culture right now. There’s no way I would personally sign off on giving access to all of our company data to a small company with no certifications, especially in an AI world where you might leak all of our data into public training models if it’s done wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: