Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So supposing you have multiple clients you might have shared access to an image or client-data database. You might not want someone to see an identifier and guess other "close" identifiers.

For example, a company I used to work for named IntegriShield is essentially a for-hire private Internet police company. When I got there, we would take an automatic screenshot of a web site and then send it to the people who were doing something "bad" (for whatever the definition of "bad" was) and you would essentially visit some sort of link like

    https://api.integrishield.com/(api route)/12345.png
Of course if these folks were to increment the identifier they would see other screenshots we were capturing -- though at the time that was not very helpful as we would routinely capture every web site that we crawled, so most of them didn't have anything interesting on them.

One of my jobs there was to improve this process: reduce image storage costs and increase the productivity of the humans who had to look at these web sites to determine what had gone wrong. With my contribution, we could intelligently flag language that might be "bad" and then capture enough metadata that we could later reconstruct screenshots which highlighted those "bad" things, after a human being had looked at them and said "no, these things are okay but those things really are bad" and deleted half of our highlights. So the screenshots were now only generated after we were done and clearly highlighted what was wrong, and then it becomes a serious problem if folks from one of these companies can see what their peers are doing.

We did not switch to UUIDs but rather just encrypted the counter with a server key, but the point is still that you would now visit

    https://api.integrishield.com/(other-api-route)/2121a46a4c24c512965671f8fb269f0b.png
and that would give you access to this image, but now if you just change that to a random identifier there is only a 2^-64 chance that you would successfully alight upon a valid image, and if you don't we can increment a counter in the database and if that gets too large too fast we can send out emails saying "warning, lots of 404s are happening".

UUIDs v4 (chosen securely randomly) have the same properties from the very start. You can easily use them in cases where the security model demands that some non-client must be able to access some sort of information simply by knowing the identifier.

We could have chosen something more complicated, of course, like creating a many-to-many table relating screenshots to email addresses and then embedding the email address somehow in the URL,

    https://api.integrishield.com/(more-api)/12345.png?email=someone%40example.com
    https://api.integrishield.com/(more-api)/12345.png?emailBatch=67890
Of those the first is worrisome -- there's no reason that one company couldn't have access to another company's email addresses for their legal compliance department and indeed in many cases they should. And if we try to obfuscate that with an auto-incrementing ID like in the second case then we're actually legit broken: remember, I told you that the screenshots are generated shortly before the emails get sent out, so they both would tend to increment in lockstep, you might have to try only 5-10 screenshot/batch pairs before alighting upon one you weren't meant to see. So the batch needs to be protected by, what, UUIDv4? And you have the same problem that you had before.


I appreciate the response. A legacy system at my company actually exposed data this way recently. A user guessed values and was able to see a small bit of data from other customers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: