Facebook crawls links in PDFs you send in Messenger

archie2 · on Oct 27, 2019

Facebook spies on everything you do. Stop using it. This is the only way this will end. It's not even a useful product. Stop being sheep.

harry8 · on Oct 27, 2019

I did, more than a decade ago. Do you think they stopped? How would I know there is no shadow profile being constantly updated?

It's apparently not my data and privacy and I have no say in it at all when someone uploads their goddamn contact list to these frickin' crooks. Is that your understanding too?

So This solution is not a solution. This solution only works if we stop other people from using it. How's that going to work? Well it isn't.

Cool? Now welcome to the "what regulation is appropriate" debate because this is clearly an example of market failure, same as national defence, same as national parks, same as pollution, same as any other. The sooner we treat "...using computers" exactly the same as any other industry the better. This is nuts.

freeflight · on Oct 27, 2019

> It's apparently not my data and privacy and I have no say in it at all when someone uploads their goddamn contact list to these frickin' crooks.

Afaik that's actually the legal situation in the US due to the third-party doctrine [0]. It's one of the ways government mass surveillance has been "legalized" without having to comply with the Fourth Amendment.

[0] https://en.wikipedia.org/wiki/Third-party_doctrine

dropin685 · on Oct 27, 2019

> This solution is not a solution.

No, not a complete solution, but it's a step in the right direction. One of the reasons advertisers as well as ordinary users come to FB is to be in contact with you -- to have your attention. Stop using FB and you take that away. FB ends up with less of an audience with which to entice others.

harry8 · on Oct 27, 2019

There's enough users remaining that your shadow profile still has value and facebook march on.

This solution is not a solution. Definitely do it, and you'll really enjoy doing it too - life improves! Just don't think it's any kind of solution to any of the systemic problems that are being discussed because it isn't. Not even a partial solution. Not even a meaningful start of one.

travisporter · on Oct 27, 2019

But if my Facebook profile is deactivated or deleted or whatever, and I’m no longer active, how can they still have information about me? I had about 300 “friends” on there but no one really tagged or mentioned me or anything.

biblytens · on Oct 27, 2019

What are you claiming is the market failure here?

harry8 · on Oct 27, 2019

Market Failure [1] is the definition in economics for this situation.

There is a non-price, non-market mechanism imposing a cost upon us. Just like if your neighbour at the restaurant lights up a cigarette and you hate that and it spoils your meal. You can't very well "give up smoking" to fix that. Just like any public good - eg national defence, it can't be done by a free market alone and so never is. Just like if a factory opens and pollutes a river killing all the fish, the fisherman can't stop transacting with the factory to get their livelihoods back. Just like any natural monopoly - eg piped drinking water supply and distribution to buildings in a city which is heavily regulated everywhere on earth, (often really badly, sure) because you can't change providers at will and you can't do a startup and get into that market. Proprety rights and enforcement - we don't do private courts and police as a rule and laws are not supposed to be for sale to the highest bidder.

Facebook have my data, they use it, without my consent and I have no way of taking my business elsewhere to remedy that. This is one of the many costs facebook impose on all of us. These are costs that not all of us feel we want to bare or feel there is a benefit from it. Us stopping using facebook does not fix that cost, we continue to incur it. That meets the textbook definition of "market failure".

Just by the by, this is not simply "my claim", this is the normal, definition of market failure in any economics textbook and regardless of political leaning. Whether you think that this market failure is an important one or not certainly can be debated. Eg it's market failure if I don't like the color my neigbor paints their front door - the response to which is usually "so what, get over it." You can, and facebook do, make that kind of argument here. Facebook PR will avoid "market failure" as it sounds somehow worse than a technical description of the situation to much of the public, as it does perhas to yourself. But there's not much doubt about it (at least as far as I'm aware), that it meets that usual, and well understood defininition among ecomomists.

[1] https://en.wikipedia.org/wiki/Market_failure

archie2 · on Oct 27, 2019

> Just like if your neighbour at the restaurant lights up a cigarette and you hate that and it spoils your meal. You can't very well "give up smoking" to fix that.

But you can go to another restaurant that doesn't allow smoking. If enough people don't like having smokers around that those restaurants goes out of business, then eventually you will see a shift in the market of available restaurants with non-smoking sections.

coldsnap427 · on Oct 27, 2019

This.

I quit several years after being on the platform for nearly a decade, which I mainly used to communicate with classmates.

Over time, I realized the platform was only good for pushing false information and giving a platform to people who shouldn't be sharing their thoughts. I also became more aware that I was a product, and it infuriated me to no end.

I'm glad to not be on it anymore. What a waste of time.

djohnston · on Oct 26, 2019

And if they didn't the headline would read: "Facebook fails to stop malicious and illegal content from being shared on their Network! Should they be shut down?!"

Barrin92 · on Oct 26, 2019

this sounds like a strawman to be honest because I haven't heard anyone rant about illegal music since probably 15 years, and if anything ever only politicians and not ordinary people.

If we'd be talking really malicious stuff like chid pornography then in the context of filesharing these companies already have systems in place to distinguish content, so blanket banning of torrent files seems blatantly unnecessary.

wutbrodo · on Oct 26, 2019

> this sounds like a strawman to be honest because I haven't heard anyone rant about illegal music since probably 15 years

This is the real strawman, as nobody on this entire thread is talking about illegal music. On the other hand, there's a strong and persistent thread of calls for tech platforms like Facebook to control "malicious or illegal" information being spread on their platforms: an obvious example is the NZ shooter's manifesto + video.

jakeogh · on Oct 26, 2019

Twitch didnt get deplatformed because a mentally ill person streamed murder on it, and you wont hear a peep about the other big chan. That had nothing to do with "protecting" (seriously? wtf) people from reading some mentally ill person's note, it was a problem-reaction-solution to axe an inconvienent site.

egypturnash · on Oct 26, 2019

Malicious content could also include phishing and viruses.

fittslickare · on Oct 26, 2019

I can not imagine this headline really. We are talking about private messages.

stevewilhelm · on Oct 26, 2019

So we want Facebook to stop propagating fake news and hate speech and we want them to do so without crawling URLs?

djohnston · on Oct 27, 2019

Yes. Those are mutually exclusive.

javagram · on Oct 26, 2019

This will keep happening until they enable e2e.

I’ve had Facebook block several links sent in private message groups, to completely legal and safe sites (Messenger prints out an obscure API error and refuses to send the content). They have done this for a long time.

rohug · on Oct 26, 2019

Worth noting WhatsApp also provides link previews now. Although it is supposedly e2e communication, the link previews are likely generated by reaching out to a similar facebook unfurl service.

They can then have a single map of phone num -> links rendered between fb and whatsapp.

yawaramin · on Oct 26, 2019

WhatsApp fetches a link preview on the sender's device before the message is encrypted, and packages it up with the message before sending. Depending on how exactly they implement the fetch, they may or may not know what links you sent.

Sami_Lehtinen · on Oct 26, 2019

At least links in PDF over WA didn't get visited from FB servers. I just personally tested it moments ago.

Thorrez · on Oct 27, 2019

What's visiting the links? Your phone?

exadeci · on Oct 28, 2019

Try sending a link immediately and you'll see that it doesn't get a preview.

Give it a few second after pasting it and you'll get the preview, because it gets it from your device.

imanobody · on Oct 26, 2019

WhatsApp also scans pdf files you send to a contact. Easy to confirm as well: get some random pdf with chinese filename and chinese content. Send it to a contact. Watch the delay in send/receive. Now do the same for any random pdf that's all English. Watch the regular send/receive time occur.

The only conclusion: it takes a little time for a file that's flagged - based on its language - to pass the scanners?

yellow_lead · on Oct 26, 2019

I experienced this too, Facebook will block most torrent links, regardless of if they're legal or not. I've taken to encoding these with Base64 first and instructing the recipient to decode them.

BubRoss · on Oct 26, 2019

Why not just make it a broken link and tell them how to correct it?

hermanradtke · on Oct 26, 2019

Why not just use an e2e messaging service? If you can convince people to decode a link then I am sure you can convince them to use Signal.

yellow_lead · on Oct 26, 2019

This was just a quick fix, but I agree with you on the e2e messaging service. However, I do wish more e2e services like Telegram would open source their backend. Looks like Signal does now at least!

tiagod · on Oct 26, 2019

Telegram is barely e2e.

harry8 · on Oct 27, 2019

Messages in plain text on the server? That's /not/ e2e at all! That's just a con. Advice: avoid telegram completely, always.

Or is my news old and that's no longer the case?

7shhw · on Oct 27, 2019

Q: What is this ‘Encryption Key’ thing?

When a secret chat is created, the participating devices exchange encryption keys using the so-called Diffie-Hellman key exchange. After the secure end-to-end connection has been established, we generate a picture that visualizes the encryption key for your chat. You can then compare this image with the one your friend has — if the two images are the same, you can be sure that the secret chat is secure, and no man-in-the-middle attack can succeed.

Newer versions of Telegram apps will show a larger picture along with a textual representation of the key (this is not the key itself, of course!) when both participants are using an updated app.

Always compare visualizations using a channel that is known to be secure — it's safest if you do this in person, in an offline meeting with the conversation partner. Q: Why not just make all chats ‘secret’?

All Telegram messages are always securely encrypted. Messages in Secret Chats use client-client encryption, while Cloud Chats use client-server/server-client encryption and are stored encrypted in the Telegram Cloud (more here). This enables your cloud messages to be both secure and immediately accessible from any of your devices – even if you lose your device altogether.

The problem of restoring access to your chat history on a newly connected device (e.g. when you lose your phone) does not have an elegant solution in the end-to-end encryption paradigm. At the same time, reliable backups are an essential feature for any mass-market messenger. To solve this problem, some applications (like Whatsapp and Viber) allow decryptable backups that put their users' privacy at risk – even if they do not enable backups themselves. Other apps ignore the need for backups altogether and fade into oblivion before ever reaching a million users.

We opted for a third approach by offering two distinct types of chats. Telegram disables default system backups and provides all users with an integrated security-focused backup solution in the form of Cloud Chats. Meanwhile, the separate entity of Secret Chats gives you full control over the data you do not want to be stored.

This allows Telegram to be widely adopted in broad circles, not just by activists and dissidents, so that the simple fact of using Telegram does not mark users as targets for heightened surveillance in certain countries. We are convinced that the separation of conversations into Cloud and Secret chats represents the most secure solution currently possible for a massively popular messaging application.

https://telegram.org/faq#q-what-is-this-encryption-key-thing

tiagod · on Oct 27, 2019

Telegram supports "secret chats", which only work on a single device and for private conversations.

Every other telegram feature has no security.

datameta · on Oct 26, 2019

I imagine it is quite easy to reassemble a broken link with some extra whitespace or random characters (unless you really scramble it which makes the process of manually "decoding" tedious). At that point you might as well automate the process and use base64

yellow_lead · on Oct 26, 2019

This one. I believe I tried that but they are able to reconstruct them to some extent at least with trivial broken-ness

9HZZRfNlpR · on Oct 26, 2019

I have had similar experiences, numerous to be more exact. The latest was 10 yrs old WordPress blog living on WordPress.com subdomain, definitely not hacked. It was about science, to be more exact, about neurology.

TooCreative · on Oct 26, 2019

e2e would not necessarily stop it. Since FB controls the apps that send and receive the message, they can do whatever they want to the unencrypted message on both sides.

espes · on Oct 26, 2019

You can choose to enable e2e on Messenger

SlowRobotAhead · on Oct 26, 2019

Because the key, nonce, result, and keyshare or Diffie-Hellman exchange are all done inside of messenger... why would anyone believe this is legit?

It might be, IDK, but if it’s all inside their system, how could you audit that?

heynk · on Oct 26, 2019

Couldn't you sort of test this by enabling E2E, sending a link that was previously blocked, and seeing if it is still blocked? That would at least show some sign if it's all a sham or not.

jacquesm · on Oct 26, 2019

That would guarantee absolutely nothing.

0xffff2 · on Oct 26, 2019

If the link was still blocked it would guarantee that Facebook is still eavesdropping.

SlowRobotAhead · on Oct 26, 2019

Other guy was right. Think about this easy scenario

  If (E2E_ENABLED) {

  SkipCrawler();  

  SkipContentChecks();  

}

BubRoss · on Oct 26, 2019

Again, it isn't to prove the encryption works, it is only a test that could prove that it doesn't work.

SlowRobotAhead · on Oct 26, 2019

Ah, yea, I got the argument backwards. I thought he was saying if content wasn’t blocked that it proved encryption worked. Can’t prove the negative.

heynk · on Oct 26, 2019

Yes, totally understood. I am just thinking in line with a different response that this could be an easy way to prove if they’re still snooping - not a guarantee that they aren’t.

cmroanirgo · on Oct 26, 2019

Agreed. There is nothing stopping the sender's app from parsing and reporting URLs in any and all content before e2e occurs... Even to FB servers

wskinner · on Oct 26, 2019

This argument applies to any messenger app that claims e2e encryption. You could build signal from source. But how much do you trust your compiler?

gbear605 · on Oct 26, 2019

I trust my compiler more than Facebook

braythwayt · on Oct 27, 2019

For any value of “compiler.”

Avamander · on Oct 27, 2019

Can you actually use Signal built from source with official servers? Anyways, we have open-source chat platforms that have been audited by independent third parties, on one side, and closed-source mergacorporations' unaudited chat software on the other. Point being, why would you argue for using the bigger "evil"?

kerng · on Oct 26, 2019

Same can be done if e2e is enabled. Nothing prevents Facebook from sending links from client to a "validation" service.

They do this already in WhatsApp for instance.

JadeNB · on Oct 26, 2019

Maybe I'm being foolish, but isn't the point of e2e that Facebook wouldn't even know what you were sending (a link or otherwise), it being encrypted in flight?

roywiggins · on Oct 26, 2019

The WhatsApp client knows, since that's the "end." Nothing technical stops Facebook from bundling some code in the client to pass data about the messages back to a central server.

progval · on Oct 26, 2019

Yes, but it's hard to trust e2e when keys go through a blackbox (Facebook's API), and clients are controlled by Facebook (closed source applications and JS).

It's better than no encryption, but not what technical people usually mean by e2e encryption

kerng · on Oct 27, 2019

Facebook can send the URLs from client to their service, that just bypasses the e2e channel and opens up a nice side channel for Facebook to peek into messages.

steelframe · on Oct 26, 2019

My company's malware detection crap on my work laptop once scanned a PDF of a security research paper I was reading for my project, found a link to a web site with malware on it because that's what the research was about, and then it summarily deleted the PDF to "protect" the company from that link.

TwoBit · on Oct 26, 2019

Well it wasn't entirely wrong. Just because the doc was about malware doesn't mean somebody won't accidentally click it.

Spivak · on Oct 26, 2019

I mean how is the security scanner supposed to know that you’re working on a project which is super specific edge case?

I mean almost all of the time that PDF will be malware designed to trick the reader into clicking that link and it did the right thing.

archie2 · on Oct 27, 2019

Most security scanners allow you to create exclusion folders, where it doesn't scan files in those folders. Something somebody researching malware on a company computer with a malware detector should probably be aware of.

RaiseProfits · on Oct 26, 2019

Presumably by disabling it on the workstations of security researchers. Or just the feature that flags linked content rather than content itself.

badrabbit · on Oct 26, 2019

Not in the least surprising. Wouldn't be surprised if Gmail does this to..."detect phishing" (pdfs containing phish links are common). Always a plausible reason they can use.

supernova87a · on Oct 26, 2019

There's no surprise. Gmail does.

If you search for a text string in Gmail, it will return emails that contain that text only in scanned images or PDFs that are in your mailbox.

odensc · on Oct 26, 2019

That doesn't mean they're crawling the URLs, just that they're indexing the content of PDFs/images for your search. Which is, honestly, a pretty useful feature. Whereas Facebook is doing this without providing any value to the user.

badrabbit · on Oct 27, 2019

I've worked with email security appliances that will visit the pdf url when detonating the attachment in a Sandbox. Dynamic analysis helps find 0day's ,sometimes the link leads to something worse,the link many times is a url shortner too

criddell · on Oct 26, 2019

Microsoft does this with Skype too. They say it's for detecting malicious links.

knzhou · on Oct 26, 2019

As always in big tech, you're damned if you do and damned if you don't.

st0le · on Oct 26, 2019

Honestly, This is good to prevent malware but I imagine this breaks a bunch of things if for eg. If the link has a limited visit count. The link will "expire" before the recipient gets a chance to view it.

Nextgrid · on Oct 26, 2019

To be fair, an HTTP GET request should never modify the state of the system - hitting a link should not change anything.

If you need to expire links then make the initial link display a form with a submit button (which does a POST) to reveal the content (and expire the link). Legitimate crawlers don’t submit forms so it should be safe.

dataflow · on Oct 26, 2019

> To be fair, an HTTP GET request should never modify the state of the system

In theory. But that's not how the world I live in seems to work.

bagacrap · on Oct 26, 2019

I think it's pretty common practice. Otherwise search engine web crawlers would be wreaking havoc.

dataflow · on Oct 26, 2019

No, both your logic and premise are incorrect. To give just one example, rate-limiting is clearly widespread stateful practice applied to GET requests, and it doesn't cause web crawlers to wreak havoc on anything.

feanaro · on Oct 28, 2019

I have absolutely no problems with the don't. I don't think any central body should be responsible for policing my private conversations and it just seems like a convenient excuse for these companies to perpetuate the surveillance.

barney54 · on Oct 26, 2019

Oh, I don’t know. I refuse to use Facebook Messenger and I have yet to be damned (at least to my knowledge)!

knzhou · on Oct 26, 2019

I meant from the perspective of tech employees that have to decide whether to add this kind of monitoring.

journalctl · on Oct 26, 2019

So they might as well don’t; at least then we get some modicum of privacy.

wutbrodo · on Oct 26, 2019

This is certainly how I feel, but "damned if you do and damned if you don't" doesn't imply that the level of damnedness is equal. The balance between fettering "malicious" speech/activity and preserving privacy seems to be strongly tilting in the mainstream towards the former recently; "tech platforms have a responsibility to heavily police the content on their systems" is apparently a lot more resonant with most people than "tech platforms should preserve the privacy of their users".

amelius · on Oct 26, 2019

Why not simply ask the user then?

grumblepeet · on Oct 27, 2019

And all email into Office 365. Gets pulled into a sandboxed environment where items such as pdfs, in fact all attachments, are ‘exploded’ and all the links investigated ie executed and checked for malicious end points or payloads. I was in a meeting recently with someone from Microsoft where they explained this (I might be misrepresenting what she said, she was an expert in this field, and I’m definitely not). I was shocked though at the capability of the system to examine content so such an extent.

Gustomaximus · on Oct 27, 2019

If Microsoft really wanted to help Skype security it would be pretty easy to realise an account has been hacked when people are suddenly message a link to all their contact list they never have for 10 years.

The amount of time I've gotten those obviously spammy links form people I have never talked to in a decade plus.... cant be hard to red flag these.

CydeWeys · on Oct 26, 2019

And I do appreciate that they're doing that even. I want that, just like I want spam filtering on my email.

It's what else might be going on with the link analysis that's worrisome.

_dujt · on Oct 26, 2019

Just remember that almost every feature that’s “announced” is a masquerade for ad-tech software to do its thing.

Example, Facebook asking for phone numbers in the name of “security” when they don’t give a shit about security. They wanted to tie a phone number to the owner, and create a social graph based on their contact uploads.

worldofmatthew · on Oct 26, 2019

This should not be news to anyone. Facebook scans all links posted in Messenger.

sushid · on Oct 26, 2019

This is links INSIDE a pdf. Thats one step further than most people assumed.

manojlds · on Oct 26, 2019

Mostly to scan the PDF and ensure it's safe I believe, or atleast that's would be the stated reason.

strstr · on Oct 26, 2019

I can believe that (despite the obvious creepyness), since FB users have a knack for getting owned and spreading that ownage to other users.

gryffin · on Oct 26, 2019

PDF is an open format. Scanning PDF for the links is probably the least you could assume.

You gotta assume they will reconstruct generations of your family tree if they can from your chat history.

twothamendment · on Oct 27, 2019

"...reconstruct generations of your family tree if they can from your chat history."

People might pay good money for that!

nitwit005 · on Oct 27, 2019

Then people have strange assumptions.

Once a service starts blocking malicious links, the obvious next step is to conceal the link through some sort of indirection.

joshspankit · on Oct 26, 2019

I would also assume they are scanning links in zip files and (if anyone is crazy enough to do this) links visible in images

axegon_ · on Oct 26, 2019

This. I honestly don't get why this is news. I truly hate facebook with a passion, I really do. But on this occasion I don't really blame them: You know what you are getting yourself into, what did you expect? A tuna salad? You shouldn't really be sharing any personal information on any platform which you can't hold accountable, regardless of e2e encryption.

gravypod · on Oct 26, 2019

Could someone effectively DOS another site using this method by including a bunch of links that generate a lot of load?

Would be interesting to see if Facebook has a maximum number of links it'll follow.

GhostVII · on Oct 26, 2019

The cost of uploading a PDF of links is probably not much less than the cost of following those links on your own. So I don't think you gain much by leveraging Facebook in this case.

ahbyb · on Oct 26, 2019

What if I create a PDF with this content...

https://news.ycombinator.com/item?id=1

https://news.ycombinator.com/item?id=2

https://news.ycombinator.com/item?id=3

and so on, until 10,000,000? Perhaps Facebook starts opening every link using 10,000 parallel threads. Can you really replicate that from your connection at home? Perhaps even the sysadmin of your victim site has whitelisted all Facebook IP addresses so their crawlers get a free ride.

gojomo · on Oct 26, 2019

There's a fair chance the service which generates these outbound requests throttles itself, both with respect to how many requests it makes against any one domain/IP, or how many errors it will provoke per time period, or before human review.

_egnm · on Oct 26, 2019

Maybe, maybe not

GhostVII · on Oct 26, 2019

I would imagine it queries sequentially so it wouldn't have too many parallel threads. Not much reason to parallelize.

_egnm · on Oct 26, 2019

It's this kind of thinking that unearths the bugs

_underfl0w_ · on Oct 26, 2019

I'm sure you could just point all those links to a domain you own (on some poor, unsuspecting VPS) and see what happens? Might be playing with fire, though.

ahbyb · on Oct 26, 2019

I remember something like this was possible back in the day with Google Sheets. You could embed a URL as if it was an image in each cell of a sheet and it would make thousands of requests. I don't remember the details.

cj · on Oct 26, 2019

https://news.ycombinator.com/item?id=3890328

buboard · on Oct 26, 2019

Sidenote i wonder why FB doesnt launch a search engine since they crawl most of the web anyways

saagarjha · on Oct 26, 2019

That would let people leave the Facebook platform and explore the open web.

kortex · on Oct 26, 2019

I remember the sheer awe when I first learned there was a huge open web outside of AOL. I'm sure people nowadays are aware of the rest of the web, but if the draw is minimal, they will likely get stuck in the same loops of well-trodden space.

sodosopa · on Oct 26, 2019

A lot of the same people who kept that AOL walled garden alive, just migrated to Facebook. To them Facebook is the web, the restaurants they like are there, the tired celebs they worship are there and whatever crazed conspiracy theories someone told them at work or at church are under Facebook News. It's comfortable and I agree with you and like how you phrased it as "the same loops of well-trodden space."

progval · on Oct 26, 2019

People follow celebrities on Facebook? Don't they use Twitter for that?

symlinkk · on Oct 26, 2019

Sort of like how all of us check Hacker News daily? Don’t act like we’re any better

saagarjha · on Oct 26, 2019

I go to websites other than Hacker News.

skinnymuch · on Oct 28, 2019

Yeah those FB people don’t go to other sites! /sarcasm

buboard · on Oct 26, 2019

I dont think they re worried much about that anymore , people always return, they ve established their position. OTOH, it would be nice for google to have some serious competition on the web, esp. considering that FB has a great NLP AI team.

progval · on Oct 26, 2019

They could create their own AMP-like tech to keep people on their website/application

EDIT: Actually they already did, it's called Facebook Instant Articles.

tantalor · on Oct 26, 2019

Facebook is America Online.

catalogia · on Oct 26, 2019

> "According to legend, upon hearing of Colt's entrance into the lever-action rifle market, Winchester began to develop a prototype revolver to compete with Colt's market. A "gentleman's agreement" then followed between Colt and Winchester, with Colt agreeing to drop production of the Burgess and Winchester abandoning its plans to develop a revolver. The truth of this story has never been fully verified, and as such, the reason for the Burgess rifle's short production history is unknown.[1][4][5]"

https://en.wikipedia.org/wiki/Colt-Burgess_rifle

bagacrap · on Oct 26, 2019

There is a search bar in Facebook. That searches Facebook, which Facebook would tell you is all the web you need.

amethyst · on Oct 26, 2019

They did that, and it wasn't successful.

mikorym · on Oct 26, 2019

You can also use canary tokens for this. [1]

I am not personally affiliated with them, but I believe they are South African.

[1] https://www.canarytokens.org/generate

egypturnash · on Oct 26, 2019

The galaxy brain maneuver here is to start trying to fuzz whatever machines Facebook is using to do this.

h1fra · on Oct 26, 2019

While I don't like FB this is not something related to PDF, they provide quick preview for all links like almost all social networks

lostmyoldone · on Oct 26, 2019

This isn't about rendering a preview, it's about following like links inside the pdf. While there may be a case to do this to prevent phishing and malware attacks, they really should ask/tell the user they are doing it!

dontbenebby · on Oct 26, 2019

This feels like it could be an attack vector. Gather intel on what the user agent is, nmap the IP, possibly find a vulnerability in the parser or the server.

ressetera · on Oct 26, 2019

I doubt the downloader isn't restricted in some kind of jail.

Enginerrrd · on Oct 26, 2019

You'd think... but after that story about Microsoft just executing random threatening code it found on someone's computer and allowing it access to the internet, I have to question some of the wisdom these big companies show.

ahbyb · on Oct 26, 2019

That's assuming Microsoft didn't do things properly. Who's to say the amount of connections that could be opened, the bandwidth, or the max traffic that could be recv/sent wasn't limited?

Thinking that you can make an HTTP request using this method and that that means you can unleash a DoS is... worth a try, but not something you can take for granted.

lostmsu · on Oct 26, 2019

You can just pint the link to your own server. That will tell you Facebook IP at the very least.

crazygringo · on Oct 26, 2019

Huh, but why?

I can totally understand scanning a PDF for links to look for malicious links to protect users.

But that wouldn't involve actual HTTP requests to them.

I'm struggling to imagine what purpose this could have.

mdasen · on Oct 26, 2019

How do you know if they're malicious if you don't make HTTP requests to them?

One of the things that phishers and others do is use link wrapping and other services to hide malicious links. So, I get something.wordpress.com/something-clean. I then put in an HTML or JS redirect on that page to something malicious. Given that browsers don't warn about HTTP, HTML, or JS redirects, it's an easy way for scammers to get around a list of malicious pages.

These kinds of attacks are very common in the email space.

gruez · on Oct 26, 2019

But in this case, that doesn't help at all because facebook's crawler uses a predictable user agent string. You give a clean result to the facebook crawler and a malicious result to everyone else.

seanieb · on Oct 26, 2019

There are services to frawl for you from miltipke ips and user agents, just for situations like this.

idoco · on Oct 26, 2019

That is a very good point. Security crawlers should probably use a masked user-agent.

marksomnian · on Oct 26, 2019

I'm fairly sure Google's search crawler already uses a masked UA, to detect when pages serve it different content than they do to users.

allie1 · on Oct 26, 2019

Not always, it masks UA and IPs when checking for ads content to uncover cloakers, so its within theit codebase to do this. Not sure why they’re not using it here.

bluntfang · on Oct 26, 2019

>How do you know if they're malicious if you don't make HTTP requests to them?

look-alike domains are phishing vector that don't require you to make an http request.

TazeTSchnitzel · on Oct 26, 2019

The malicious links could be camouflaged behind a redirect.

allie1 · on Oct 26, 2019

Could be collecting the links so if a user blocks the sender after opening the pdf, and this is done at scale, they can infer it was one of the links and starts blocking them?

Or link support requests to people who received a certain link via message.

So basically data mining to feed a model that takes future actions in consideration.

danielfoster · on Oct 26, 2019

Probably anti-spam, particularly to catch groups of fake accounts sending the same or similar PDF.

austhrow743 · on Oct 26, 2019

How do you check if a link is serving up something terrible without http requests to them?

tialaramex · on Oct 26, 2019

You _could_ ask a service like Google Safe Search

Just in case you didn't follow any of the previous HN discussion of how that's done

consider the URL https://accounts.example.com/tmp/badmojo.exe

You (Facebook in this case) run a hypothetical method SafeSearch('accounts.example.com') and also SafeSearch('example.com') and SafeSearch('accounts.example.com/tmp') and SafeSearch('accounts.example.com/tmp/badmojo.exe')

SafeSearch(string) is defined as, you do SHA(string) and that's your hash, you compare the start of this hash to a huge list of prefixes that Google provides, which you fetch updates for every few minutes. If there's no match, fine, done. If there's a match you ask Google OK, I saw this Prefix you sent me, what hashes should I be scared of? Google gives you a list of hashes with that Prefix. If your hash in this new list, the original URL was scary, warn users not to visit, otherwise continue what you were doing.

gojomo · on Oct 26, 2019

Sure, but this will only work for previously-known threats – for which someone else, presumably Google, has already done the request, analysis, and determination.

I doubt Facebook only wants to detect old threats, reliant on a competitor's standards & practices.

Frost1x · on Oct 26, 2019

The obvious argument is they need to scan pages linked for malware and couldn't rely on a white/black list.

I'm sure if they're pulling data to do this analysis, it's not the only analysis they're doing.

w1nst0nsm1th · on Oct 27, 2019

There is a way to send any type of file through messenger without facebook snooping into your private life.

1. Compress the file you want to send in an password protected zip. 2. change the extension of the Zip file (.zip) to text file (.txt). 3. Send the file trough Messenger.

I already did it to send MacOS application to a friend. To avoid size restriction, compress in several zip parts, rename the extentions to .txt and send.

File size can be as high as 50mb+.

lukeschlather · on Oct 26, 2019

Are there any comparable hosted messaging services that don't do this?

s09dfhks · on Oct 26, 2019

signal/telegram

tialaramex · on Oct 26, 2019

Signal actually does optionally offer previews for a handful of services, and they really jump through some hoops to make that safer:

https://support.signal.org/hc/en-us/articles/360022474332-Li...

The service being previewed doesn't know who you are because Signal acts as a proxy, Signal doesn't know what you previewed on that service because their client deliberately sends overlapping Range requests so that the preview size is rounded.

duskwuff · on Oct 27, 2019

Telegram does crawl links sent in non-encrypted messages. Links in secret chats are left alone, though.

brennebeck · on Oct 26, 2019

Probably Signal.

beager · on Oct 26, 2019

A good way to enable delivery tracking for PDFs over messenger, I guess!

NullPrefix · on Oct 26, 2019

I assume, the tracking happens before the recipient sees the message. This could be used to track sent messages, not received.

beager · on Oct 26, 2019

Yep there’s still no “open” tracking on PDFs, at least afaik from a pixel/beacon standpoint. Entire businesses like docusign are built with that value prop in mind.

fsociety · on Oct 26, 2019

It's to protect from malware and phishing. Plenty of other companies do this too and it is reasonable. Someone else here put it really well.. damned if you do and damned if you don't.

zzo38computer · on Oct 27, 2019

Would making a passworded PDF file help a bit? Then you can tell them the password by a separate message.

(And, I don't use Facebook, anyways.)

omani · on Oct 27, 2019

seriously. how many times has the world tell you to stop using facebook?

stop using facebook. and no, there is not a single reason for you to be there. trust me. how do you think we lived our lives before there was a facebook?

sys_64738 · on Oct 26, 2019

I only use the messenger thing on the Facebook webpage. I don’t generally install apps on my phone as I don’t trust them. I trust an ad company like Facebook the least.

ariyadi · on Oct 27, 2019

Let’s play this game

burtonator · on Oct 26, 2019

Facebook: we can crawl you but you can't crawl us!

tus88 · on Oct 26, 2019

Not to be flippant, but if you choose to use Facebook you are already throwing your privacy to the wind, and you are probably getting what you deserve.