Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While that may be true, it’s also true regardless of the ranking algorithm or which web search is the winner.

Webspam will seek to game whichever search company has dominant market share and they will structure their spam to overcome the filter and ranking specifics of that engine.

Considering tools like GPT-3, one could easily imagine in the limit, a spammer running a large number of searches through a search engine, finding out what ranks high, and the training a generative model on that dataset to produce similar articles. Auxiliary signals like inbound links and DNS records they can also usually work around by purchasing domains or buying inbound links.

It will always be a war and there is never going to be a victory over webspam. Even with something like web3 where posting content costs money I can imagine ways spam.



I guess in the perfect magical world where the algorithm detected "suitability for humans for the given query" then if gpt-3 spat out a bunch of stuff and it ranked highly, it would mean this wasn't really spam, because it was useful to humans, even if it was generated in a spammy way.


Well, it could still be spam, just that is very closely resembles non-spam, but the end result is you didn't get the information you were looking for, just something that was closed to it, but misinformation.

Imagine you're searching for vaccine information. GPT-3 generated spam provides pages that highly look like authentic NIH medical reports and expert opinion, only every piece of factual data was replaced with the spammer's product (let's say a competing Pharma vaccine that is less effective)

Search engines are not arbiters of truth, they can only approximate it via consensus knowledge, and if consensus is distorted by vast amount of spam, well, there's a problem.

Think of it like a blockchain ledger. If someone steals more than 50% of the hashing power, than all bets are off. Well if someone steals a non-trivial amount of a subject area with spam, than consensus algorithms break down, and the 'ledger' for that subject looks to be weighed towards the spammer.


Just heavily penalize presence of affiliate links and ads on the page and the 90% of the problem will go away. But of course it's something Google will never do. It all comes down to their perverse incentives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: