so if they put their linkedin account on their HN account, we can figure out who they are.... genius stuff, AI really is changing the landscape all right
To be clear, we are making a clear concession here that the people weren't truly anonymous. But we did use an LLM to remove any identifying information from HN making them quasi-anonymous, this is more described in the appendix Table 2.
We do also make a more real world like test in section 2. There we use the anthropic interviewer dataset which Anthropic redacted, from the redacted interviews our agent identified 9/125 people based on clues.
Edit: actually I've re-upped your submission of that link and moved the links to the paper to the toptext instead. Hopefully this will ground the discussion more in the actual study.
Yeah my first thought was "of course an LLM can do that, we didn't need a paper to tell us". I would be more impressed if it could do it without that information, such as by analyzing writing styles and other cues that aren't direct PII.
It’s the same thing as theft and locks. Any motivated attacker will overcome any rudimentary obstacle. We still use locks because most opportunistic attackers are the most prevalent.
Even the paper on improved phishing showed that LLMs reduce the cost to run phishing attacks, which made previously unprofitable targets (lower income groups), profitable.
The most common deterrent is inconvenience, not impossibility.
I agree that these accounts probably on average still contain more information than the average pseudonymous account. I think we could try to use the LLM to increasingly ablate more information and see how it performance decays – to be clear we already heavily remove such information, see Table 2 appendix. But I don't expect that to change the basic conclusions.
I also wonder how well the LLM would do with less direction e.g. just ask it to analyze someone's posts and "figure out what city they live in based on everything you know about how to identify someone from online posts".
Over a large enough timeframe (often a couple years at most), almost everyone online gives too much information about themselves. A seemingly innocuous statement can pin you to an exact city and so on.
It's a pity that you didn't make your point more thoughtfully because it's one of the few comments in the thread so far that has anything to do with the actual paper, and even got a response from one of the authors. That's good! Unfortunately, badness destroys goodness at a higher rate than goodness adds it...at least in this genre.
Doesn't need a terminal: run it in RPC mode to send/receive JSON over stdio. That's how the pi-coding-agent Emacs package works, which is the only way I've ever used Pi.
It seems pretty well done: when I added permission requests to the `bash` tool, the "Are you sure y/N" requests started appearing just like they were native to Emacs.
Yeah, right? Not one ever actually turned out to be true!
That conspiracy about billionaires, who supposedly own all of western media, having deliberately created an environment in which anyone who expresses even the remote idea of a conspiracy, gets discreditted, is also not true!
> We’ve been living in the enshitification economy.
that whiny bullshit about somebody elses website? you dont have to rely on a website or app. either you need their monopoly because you cant do it yourself, or you have options.... in both cases the whining is not needed
reply