Spam, junk, slop? The latest wave of AI behind the 'zombie internet'

tromp · on June 11, 2024

similar to https://news.ycombinator.com/item?id=40645983

VyseofArcadia · on June 11, 2024

This is next generation blogspam. When you search for something technical hoping for a link to a reference manual, and you get page after page of low-effort blogpost tutorials? Now some of those are so low effort that they aren't even written by people.

Bookmark your favorite reference sites and use their search. We're coming back around to the pre-Google era of search engine result quality. Who knows, maybe human-curated web directories[0] will come back into fashion.

[0] https://en.wikipedia.org/wiki/DMOZ

eloisius · on June 11, 2024

I know the Kagi drumbeat is nonstop around here, but the AI slop problem is really not as pervasive as I feared it would be. Granted, since the advent of mass enslopification, most of my daily searching is in a fairly restricted space: computer vision, 3D geometry and a smattering of python tools. I've pinned all the official documentation, raised the high-signal sites, lowered the medium.coms, and blocked all those sites the mirror GitHub issues. The vast majority of my searches get me the thing I want on the first try as the first result.

I know slop is going to ruin everything else, but I've been able to carve out my little space well enough for my on-the-job need internet needs. Off the job, it's probably better to be IRL anyway.

VyseofArcadia · on June 11, 2024

Also a Kagi user, and I've done the same. It is frustrating and ironic that generative AI is forcing me to do extra work like this.

mannycalavera42 · on June 11, 2024

that's why you can downrank a specific site in the search results so that next time that site won't be shown up in the results

VyseofArcadia · on June 11, 2024

That just means you're playing slop whack-a-mole every time you search. You are underestimating how quickly these things pop up. They're like weeds.

AlienRobot · on June 11, 2024

Honestly, I don't even get why I need an online search engine to look up things because most of the time I just want examples of how to do X in programming language Y or the documentation of library Z.

It's not an infinite number of webpages. Someone could just put all those links into a .json and make an applet that let you instantly look up any of them.

GaggiX · on June 11, 2024

"slop" is not a new term, it has been used for years to describe low quality content that is produced consistently and according to trends and it's not strictly related to AI. My guess is that some people who are not familiar to internet slang, saw the word "slop" being used when talking about AI and thought it's a new word used for AI generated content.

Symbiote · on June 11, 2024

It's perfectly reasonable for a mainstream newspaper to cover a recent technical term.

It's not yet in published dictionaries, and was only added to Wiktionary in January.

d3w3y · on June 11, 2024

There was a blog post I read on HN that was about how existing large publications who were threatened by the internet (think big magazines) began to churn out low-quality content that will allow their sites to dominate search rankings despite offering less value than higher-quality posts from small blogs.

I think the blog post was from some kind of home air filter review site or something, but I can't find it....

simonw · on June 11, 2024

The word "spam" existed long before it came to also mean unwanted marketing content.

quantified · on June 11, 2024

Prior to meaning unwanted marketing content, it was the name of a canned meat. At least in the Anglosphere there wasn't any other widespread common meaning for it.

stonogo · on June 11, 2024

the word "spam" similarly existed before the advent of unsolicited commercial email. the meanings of words change all the time.

quantified · on June 11, 2024

What do you recall it meaning? I didn't hear it used for anything besides a specific brand of canned meat.

stonogo · on June 11, 2024

That's exactly what it meant. Its use to describe UCE derived from a Monty Python sketch where a restaurant customer was trying to order food without it, but it was included in every menu item.

quantified · on June 11, 2024

Technically, all proper names may be words, so yes you were/are correct in that sense. It's sort of unconventional, like saying that "Elon" is a word. The way you phrased it ("word" vs "name"), I interpreted as having a non-brand-name meaning.

Bluestein · on June 11, 2024

(Indeed the term itself dates back to 1350-1400 in middle English:

- https://www.dictionary.com/browse/slop

... but I think the usage is new. Either way my thinking is that it is beneficial for a specific word for the concept of "unwantes AI generated content" to exist, as a thing, so we can discourse about it ...)

winddude · on June 11, 2024

It's been getting bad for awhile, top search results are normally pay walled... less content from individual blogs. Facebook is a complete walled garden it's content isn't even indexed. Facebook, reddit, discord, etc have almost completely killed special interest forums.

Now there is a small chance AI generated content could get more content available not behind paywalls, with more small and micro publishers. But getting rid of the factually inaccuracies and hallucinations would be a huge task, that google doesn't seem to be capable of, at least their recent blunders suggest.

failuser · on June 11, 2024

I wonder if there is a need for wall-piercing search now. That might be against user agreement, but when that had stopped anyone motivated?

winddude · on June 12, 2024

can you elaborate on "wall-piercing search"? Most of them do let google bot index the full content, or the displayed content is enough... for a high rank. People need to stop clicking on them, or search needs to punish them (at one point google/matt cutts said they would do this), but this won't happen.

failuser · on June 13, 2024

Does google index discord or telegram? I have not seen those in the results.

Flatcircle · on June 11, 2024

I've had this theory for a while now, and it seems to be playing out. The future of the internet is going to be behind a paywall.

The old internet as we know it, the free one, will be a cesspool of ads, spam, and scams. A deserted wasteland no one goes to.

Internet connected devices, (especially for kids) will only connect to whitelisted, paid sites.

Perhaps we'll look back and remember the innocence we used to have about the old, open internet.

d3w3y · on June 11, 2024

You might not be too far off. Someone should start a league of free-web Avengers to blast the internet wide open again.

burningChrome · on June 11, 2024

I think you're spot on.

As soon as social media went from this really cool place to connect with communities and people into a cesspool of partisan politics, overbearing ads and massive echo chambers and horrific abuse of people's private information - the sheen of the internet had worn off for me.

Maybe I'm just being a pessimist and am only seeing the negative stuff, but it just feels overwhelmingly bad. this AI spam is just another nail in the coffin for me. I've already been pulling back from the internet for a while now, and nothing has given me any encouragement to engage any more than I absolutely have to at this point.

smarm52 · on June 11, 2024

Filtering out that sort of thing is either too expensive or too complicated to bother with.

> We're coming back around to the pre-Google era of search engine result quality.

A solution is just to formalize this process. Does a site have a high reputation? Then return its results. What makes a high reputation? Can't really quantify "quality content", but you can say "a lack of slop", good citations, and fact checking.

There's really no such thing as "reputation quantization" (as far as I know), but it would certainly help address the "slop" problem.

graeme · on June 11, 2024

Anyone have a good solution for ai email spam?

I’ve been marking it as spam, but since it sounds like a human I realize some of my actual human correspondents are going to spam.

_flux · on June 11, 2024

It seems like LLMs would be great for this: "Is this email trying to sell me something?" "Is it from an acquintance?" "I have these mailboxes, which is the most appropriate?" with some additional metadata.

Well, as long as it doesn't fall into prompt injections.

Sadly I don't know of existing solutions to do this.

tivert · on June 11, 2024

> It seems like LLMs would be great for this: "Is this email trying to sell me something?" "Is it from an acquintance?" "I have these mailboxes, which is the most appropriate?" with some additional metadata.

Why LLM's, unless it's the only hammer you have? For instance, "is it from an acquaintance?" seems like a case for a classic mail rule. It's not like an LLM can read your mind to find out who your acquaintances are, and mining your correspondence to figure it out doesn't pass the smell test. If you chat with a friend about "Steve," how would it figure out who that actually is? Should it let all mail from every self-identified Steve in?

The only way to do better than a mail rule is privacy-invading metadata collection a-la Facebook (who slurped the address books of you and your friends, and tracks all your communications that it can, so it knows who your friends are who who your friend's friends are).

kohbo · on June 11, 2024

You skipped over the one that is perfect for LLMs, determining the purpose of intent in the message.

FezzikTheGiant · on June 11, 2024

Would you be willing to pay for a gmail plugin that takes a stab at this? something like Mailman [1] but with an LLM layer for detecting AI slop?

[1]: https://mailmanhq.com/

nuz · on June 11, 2024

I can't be the only one suspicious that there seems to be a coordinated set of articles (guardian and NYT) roughly at the same time pushing the same narrative (conveniently about something threatening their business). Gives me "this is extremely dangerous to our democracy" vibes.

0_____0 · on June 11, 2024

This happens across journalism and topics, certainly not limited to this one.

I'm not sure exactly how they all have the same ideas at the same time, I suspect that journos are highly interlinked and when a spicy tip or publication drops, they all get excited about it.

datadrivenangel · on June 11, 2024

A big enough fraction of a group of people attends the same events or sees the same set of tweets/articles, and eventually a decent fraction of that fraction comes up with the same ideas.

jellicle · on June 11, 2024

The journalists don't talk to each other, the PR agency that created the story sends it to multiple journalists.

0_____0 · on June 12, 2024

Journos absolutely do talk to each other...do you know any journalists? How many of their friends are journos as well do you figure?

The press release effect is real however

Philorandroid · on June 11, 2024

Journalists always overlap their coverage. Even 'exclusive' stories will get articles describing the exclusive coverage. A story/theme being covered from multiple sources is hardly evidence of conspiracy in a vacuum.

arsenico · on June 11, 2024

How is slop threatening Guardian?

ben_w · on June 11, 2024

The Guardian in particular has a nickname of "Grauniad" owing to a historical failure to spot typos before going to print.

More broadly, there's the Gell-Mann amnesia effect, which had been noted well before it was given that name, including by Thomas Jefferson.

https://benwheatley.github.io/blog/2024/03/19-14.53.05.html

stonogo · on June 11, 2024

You're suspicious that multiple journalists are covering current events? This must get exhausting.

nuz · on June 11, 2024

No need for that tone

eropple · on June 11, 2024

Conspiracism merits it. JAQing isn't a virtue.

nuz · on June 11, 2024

Agree to disagree

eropple · on June 11, 2024

No. A functional epistemological environment requires something much more along the lines of extraordinary claims require extraordinary evidence and JAQing is an abrogation of the latter duty implied by the former action.

"Agreeing to disagree" is the last redoubt of conspiracism because it's so bankrupt in the standing of a given claim that there's nothing else to lean on.

aantix · on June 11, 2024

Job applications should be included in the "slop" pile.

The hiring process needs a revamp - and needs to pivot to recorded applications quickly.

metadat · on June 11, 2024

Recorded? Do you mean as in a video recording? As a hiring manager, I'm not sure I want this; it'll take even more time to wade through each one, since you can't quickly scan through a video like you can a resume.

wincy · on June 11, 2024

Haha I remember for my first dev job thinking the hiring manager would watch a ten minute long tech demo I’d posted on YouTube of my personal project. I was so excited when I interviewed and asked if he had watched it, he responded with “I glanced at it”. Ten years year and having hired people, I laugh when I think of that interaction.

That said, if someone applying to a Junior dev position linked me to a YouTube video in their cover letter explaining a personal project they’d been working on, it’d for sure put them at the top of the Junior dev pile for interviews for me. So I guess maybe it did help, as I did end up getting the job.

d3w3y · on June 11, 2024

On the other hand, requiring recordings could reduce the number of applications you get sent, at least as long as their are requirements specific to your application that would prevent applicants from mass-submitting the same video to all of their target jobs.

gs17 · on June 11, 2024

One-sided video interviews are horrible. I'd rather write a million cover letters that no one reads.

Aurornis · on June 11, 2024

There’s another type of internet junk that is more insidious than the obvious spam and AI filler content: It’s the ragebait and karma farming that happens on social media.

The weird part about ragebait content is that most of it doesn’t have an obvious monetary benefit to the posters. The goal appears to be generating as many upvotes and engagement as possible. There might be some secondary monetary goal such as building a following to sell to them later (Instagram, TikTok) or to build an account’s reputation to use it in spam operations later (Reddit, Twitter). However, I think a lot of people just post the content for the thrill of getting a lot of upvotes and seeing people worked up.

Try checking the front page of Reddit while logged out (to see the default subs). It’s a wild place that describes an alternate reality in which the worst possible outcomes are the norm. During COVID, everyone was going to die. After the vaccine, the talk was about everyone getting evicted and the “homeless wave” that was coming. Now that inflation is a topic, there are daily posts about how masses of people are going to starve to death because nobody can afford food.

A couple days ago one of the top posts on Reddit was a photo of 30 pills and nothing more than a title claiming it costs $12,000. Thousands of angry comments followed, demanding riots in the street and revolutions and talking about people dying. Buried in the comments was a single thread where someone actually looked up the price of the pills: It was $34, not $12,000. The manufacturer also has a program where people who can’t afford it can get the pills for free. It didn’t stop the post from staying on top Reddit for a day and convincing countless numbers of people of a complete lie.

This content seems to appeal to people who enjoy being outraged and don’t mind an exaggerated lie as long as it is in furtherance of a narrative they believe. People will look at that outrage thread and say it’s okay because America does have a drug price problem even if that story was a complete fabrication. This is the root problem: Too many people like the ragebait, the karma farming posts, and soon, the infinite AI generated content coming to feed them exactly the outrage they want to read every day.

brap · on June 11, 2024

One variant of ragebait I really hate is posters pretending to be stupid or confusing in order to get people to correct them, make fun of them, fight in the comments, ask questions etc, all because more comments equals more exposure.

For example. I literally just saw a video on my feed of a professional barber trying to fix a balding person’s haircut, by spray painting his scalp. The barber was acting serious, pretending to show off his skills, but the real content was how ridiculously awful it looked, and this accounted for 99% of the (many) comments. He knew what he was doing.

Is there a name for this kind of slop?

dclowd9901 · on June 11, 2024

I’ve heard people call it “engagement bait”, since the goal seems to be to get people to comment or engage with the post, since that gives them a higher profile in the algorithm’s evaluation.

dclowd9901 · on June 11, 2024

Part of me thinks “ragebait” is a systematic effort by enemy nations to undermine nationality in the US, and drive up discontent. A sort of psy-ops program to break down the mood and will of an entire country by constantly sowing seeds of discontent into its population.

I’m sure there’s prior art to this kind of thing but it just feels like a hunch at this point.

gs17 · on June 11, 2024

I'm sure that's some of it, but a lot of it is easily understood as organic. Bait gets attention. For some people that's enough, but plenty of ragebait distributors are monetizing it.

Bluestein · on June 11, 2024

> ragebait content

You raise a good and valid point. Methinks we need not just one more neologism, but an entire bestiary of internet content.-

gs17 · on June 11, 2024

> This is the root problem: Too many people like the ragebait, the karma farming posts, and soon, the infinite AI generated content coming to feed them exactly the outrage they want to read every day.

This is a big part of the issue, and it's happening in non-social media contexts too. My parents use no social media, but they're constantly afraid and angry about things that often don't exist or aren't real issues.

skciva · on June 11, 2024

I've noticed a lot of content lately that purposely farms engagement through dark patterns. For instance: getting a detail wrong or poor spelling/grammar almost always draws a lot of responses to dunk on the original creator.

When platforms like Twitter and TikTok monetize engagement you realize that these can often be intentional. Once you recognize the patterns it becomes very hard to unsee.

sandworm101 · on June 11, 2024

>> Amazon has become a marketplace for AI-produced tomes that are being passed off as having been written by humans, with travel books among the popular categories for fake work.

Yet another source of horribleness from Amazon's book department. I recently wanted to buy a softcover copy of Lovecraft's collected works. I know it's out of copyright but I want a book I can hold. What came was junk, poorly scanned to the point of having extra page numbers stolen from the original. This is what Amazon does with books. No matter how much or little you pay, the delivered book will somehow not be what you wanted.

burningChrome · on June 11, 2024

I had a similar experience. My only copy of the "The Dirt" the autobiography of Motley Crue. It was a hard cover that my ex pilfered from me when we broke up. I went on Amazon to get it replaced. Ordered a soft cover because it was significantly less expensive.

What came was exactly the same thing. I think the cover was legit, but the contents were so poorly photocopied, you could see the shadows on the corners of many of the pages where they were copied. Many of the pages were just flat out crooked to boot. It was comical the seller passed this off as legit.

It was the last straw for me ordering things off of Amazon. Now I try and order stuff directly from the manufacturer so I know for sure it's legit. I'm now comfortable paying more for stuff knowing that's it legit than rolling the dice on Amazon and getting a fake or counterfeit item.

ClearAndPresent · on June 11, 2024

The exhaust fumes of capitalist engines optimising for a single variable.

OutOfHere · on June 11, 2024

On the other hand, some AI generated content can be pretty good, informative, and useful. This is in the hands of its prompt's creator and in the quality of the model. It has to be done with care to ensure quality, avoiding falsities and hallucinations.

simonw · on June 11, 2024

Part of my personal definition of slop is that it's unreviewed.

If someone uses ChatGPT to help them write and iterate on a well researched essay about a topic and verifies that the details are correct, I don't see that as slop.

People with English as a second language are getting a huge boost from these tools. That's great!

OutOfHere · on June 11, 2024

You are presupposing that it's not possible for GPT to write a well researched essay, which shows a limited understanding of how to use it well for research. For targeted essays, the audience is a single person, and so there is no reason for someone to iterate on GPT's work; it is sufficient to review it.

simonw · on June 11, 2024

That's why I emphasize "unreviewed".

It is an uncontroversial fact that LLMs make mistakes.

If a human being has reviewed the output of GPT and is willing to stake their personal reputation on that content being "correct", then I'm happy to read it.

If nobody has done that then it's slop and a waste of my time.

(Unless of course I requested that information from GPT myself, in which case it's on me to review it before sharing it with others.)

quantified · on June 11, 2024

> If someone uses ChatGPT to help them write and iterate on a well researched essay about a topic and verifies that the details are correct, I don't see that as slop.

Where is the presupposition? The parent comment seems to match your sentiment.

OutOfHere · on June 11, 2024

The presupposition is that one has to write the essay or iterate upon it (using GPT). It is that GPT won't do it all for them. With a sufficiently developed workflow, I believe GPT can do it all. I guess we were thinking of different applications.

Yes, a human always has to review the output, but not necessarily to micromanage or edit it further, sometimes only to approve or reject the claims.

Bluestein · on June 11, 2024

> unreviewed

This.-

Unreviewed, or downright unsupervised. And, unsolicited. Perhaps, also, "unmarked".-

PS. Perhaps also, "low quality".-

Slyfox33 · on June 11, 2024

No one has even come closer to ensuring quality and avoiding fake information given by LLMs. No one even knows if its possible to do.