Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If all you want is to obfuscate the fact that your social media site only has 200 users and 80 posts, simply use a permutation over the autoincrement primary key. E.g. IDEA or CAST-128, then encode in base64. If someone steps on your toes because somewhere in your codebase you're using a forbidden legacy cipher, just use AES-128. (This is sort of the degenerate/tautological base case of format-preserving encryption)

(What do you think Youtube video IDs are?)





The problem with this approach is that you now have to manage a secret key/secret for a (maybe) a very long time.

I shared this article a few weeks ago, discussing the problems with this kind of approach: https://notnotp.com/notes/do-not-encrypt-ids/

I believe it can make sense in some situations, but do you really want to implement such crypto-related complexity?


The article is self-contradictory in that it acts like that key is super-important ("Operations becomes a nightmare. You now have a cryptographic secret to manage. Where does this key live? Protected by a wrapping key living in a KMS or HSM? Do you use the same key across prod, staging, and dev? If dev needs to test with prod data, does it need access to prod encryption keys? What about CI pipelines? Local developer machines?") but then also acknowledges that we're talking about an obfuscation layer of stuff which is not actually sensitive ("to hide timestamps that aren't sensitive"). Don't get me wrong, it's a definitive drawback for scaling the approach, but most applications have to manage various secrets, most of which are actually important. E.g. session signing keys, API keys etc. It's still common for applications to use signed session with RCE data formats. The language from that article, while not wrong, is much more apt for those keys.

That being said, while fine for obfuscation, it should not be used for security for this purpose, e.g. hidden/unlisted links, confirmation links and so on. Those should use actual, long-ish random keys for access, because the inability to enumerate them is a security feature.


I always thought they are used and stored as they are because the kind of transformation you mention seems terribly expensive given the YT's scale, and I don't see a clear benefit of adding any kind of obfuscation here.

> What do you think Youtube video IDs are?

I actually haven no idea. What are they?

(Also what is the format of their `si=...` thing?)


YouTube video ids are just integers in a base-64 encoding, modified to be URL safe.

Interesting. Any examples? I mean, I can probably reverse-engineer something myself but just curious.

I am much more interested in the `si` parameter.. but I am fairly sure nobody outside of Google knows what it is exactly.


Can’t recall where I heard this, but I’m pretty sure the si=… is tracking information that associates the link with the user who shared it.

Oh absolutely, I am just wondering _what_ does it contain.

Why not use AES-128 by default? Your CPU has instructions to accelerate AES-128.

Can't you just change the starting value of your sequence?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: