This is next generation blogspam. When you search for something technical hoping for a link to a reference manual, and you get page after page of low-effort blogpost tutorials? Now some of those are so low effort that they aren't even written by people.
Bookmark your favorite reference sites and use their search. We're coming back around to the pre-Google era of search engine result quality. Who knows, maybe human-curated web directories[0] will come back into fashion.
I know the Kagi drumbeat is nonstop around here, but the AI slop problem is really not as pervasive as I feared it would be. Granted, since the advent of mass enslopification, most of my daily searching is in a fairly restricted space: computer vision, 3D geometry and a smattering of python tools. I've pinned all the official documentation, raised the high-signal sites, lowered the medium.coms, and blocked all those sites the mirror GitHub issues. The vast majority of my searches get me the thing I want on the first try as the first result.
I know slop is going to ruin everything else, but I've been able to carve out my little space well enough for my on-the-job need internet needs. Off the job, it's probably better to be IRL anyway.
Honestly, I don't even get why I need an online search engine to look up things because most of the time I just want examples of how to do X in programming language Y or the documentation of library Z.
It's not an infinite number of webpages. Someone could just put all those links into a .json and make an applet that let you instantly look up any of them.
"slop" is not a new term, it has been used for years to describe low quality content that is produced consistently and according to trends and it's not strictly related to AI. My guess is that some people who are not familiar to internet slang, saw the word "slop" being used when talking about AI and thought it's a new word used for AI generated content.
There was a blog post I read on HN that was about how existing large publications who were threatened by the internet (think big magazines) began to churn out low-quality content that will allow their sites to dominate search rankings despite offering less value than higher-quality posts from small blogs.
I think the blog post was from some kind of home air filter review site or something, but I can't find it....
Prior to meaning unwanted marketing content, it was the name of a canned meat. At least in the Anglosphere there wasn't any other widespread common meaning for it.
That's exactly what it meant. Its use to describe UCE derived from a Monty Python sketch where a restaurant customer was trying to order food without it, but it was included in every menu item.
Technically, all proper names may be words, so yes you were/are correct in that sense. It's sort of unconventional, like saying that "Elon" is a word. The way you phrased it ("word" vs "name"), I interpreted as having a non-brand-name meaning.
... but I think the usage is new. Either way my thinking is that it is beneficial for a specific word for the concept of "unwantes AI generated content" to exist, as a thing, so we can discourse about it ...)
It's been getting bad for awhile, top search results are normally pay walled... less content from individual blogs. Facebook is a complete walled garden it's content isn't even indexed. Facebook, reddit, discord, etc have almost completely killed special interest forums.
Now there is a small chance AI generated content could get more content available not behind paywalls, with more small and micro publishers. But getting rid of the factually inaccuracies and hallucinations would be a huge task, that google doesn't seem to be capable of, at least their recent blunders suggest.
can you elaborate on "wall-piercing search"? Most of them do let google bot index the full content, or the displayed content is enough... for a high rank. People need to stop clicking on them, or search needs to punish them (at one point google/matt cutts said they would do this), but this won't happen.
As soon as social media went from this really cool place to connect with communities and people into a cesspool of partisan politics, overbearing ads and massive echo chambers and horrific abuse of people's private information - the sheen of the internet had worn off for me.
Maybe I'm just being a pessimist and am only seeing the negative stuff, but it just feels overwhelmingly bad. this AI spam is just another nail in the coffin for me. I've already been pulling back from the internet for a while now, and nothing has given me any encouragement to engage any more than I absolutely have to at this point.
Filtering out that sort of thing is either too expensive or too complicated to bother with.
> We're coming back around to the pre-Google era of search engine result quality.
A solution is just to formalize this process. Does a site have a high reputation? Then return its results. What makes a high reputation? Can't really quantify "quality content", but you can say "a lack of slop", good citations, and fact checking.
There's really no such thing as "reputation quantization" (as far as I know), but it would certainly help address the "slop" problem.
It seems like LLMs would be great for this: "Is this email trying to sell me something?" "Is it from an acquintance?" "I have these mailboxes, which is the most appropriate?" with some additional metadata.
Well, as long as it doesn't fall into prompt injections.
Sadly I don't know of existing solutions to do this.
> It seems like LLMs would be great for this: "Is this email trying to sell me something?" "Is it from an acquintance?" "I have these mailboxes, which is the most appropriate?" with some additional metadata.
Why LLM's, unless it's the only hammer you have? For instance, "is it from an acquaintance?" seems like a case for a classic mail rule. It's not like an LLM can read your mind to find out who your acquaintances are, and mining your correspondence to figure it out doesn't pass the smell test. If you chat with a friend about "Steve," how would it figure out who that actually is? Should it let all mail from every self-identified Steve in?
The only way to do better than a mail rule is privacy-invading metadata collection a-la Facebook (who slurped the address books of you and your friends, and tracks all your communications that it can, so it knows who your friends are who who your friend's friends are).
I can't be the only one suspicious that there seems to be a coordinated set of articles (guardian and NYT) roughly at the same time pushing the same narrative (conveniently about something threatening their business). Gives me "this is extremely dangerous to our democracy" vibes.
This happens across journalism and topics, certainly not limited to this one.
I'm not sure exactly how they all have the same ideas at the same time, I suspect that journos are highly interlinked and when a spicy tip or publication drops, they all get excited about it.
A big enough fraction of a group of people attends the same events or sees the same set of tweets/articles, and eventually a decent fraction of that fraction comes up with the same ideas.
Journalists always overlap their coverage. Even 'exclusive' stories will get articles describing the exclusive coverage. A story/theme being covered from multiple sources is hardly evidence of conspiracy in a vacuum.
No. A functional epistemological environment requires something much more along the lines of extraordinary claims require extraordinary evidence and JAQing is an abrogation of the latter duty implied by the former action.
"Agreeing to disagree" is the last redoubt of conspiracism because it's so bankrupt in the standing of a given claim that there's nothing else to lean on.
Recorded? Do you mean as in a video recording? As a hiring manager, I'm not sure I want this; it'll take even more time to wade through each one, since you can't quickly scan through a video like you can a resume.
Haha I remember for my first dev job thinking the hiring manager would watch a ten minute long tech demo I’d posted on YouTube of my personal project. I was so excited when I interviewed and asked if he had watched it, he responded with “I glanced at it”. Ten years year and having hired people, I laugh when I think of that interaction.
That said, if someone applying to a Junior dev position linked me to a YouTube video in their cover letter explaining a personal project they’d been working on, it’d for sure put them at the top of the Junior dev pile for interviews for me. So I guess maybe it did help, as I did end up getting the job.
On the other hand, requiring recordings could reduce the number of applications you get sent, at least as long as their are requirements specific to your application that would prevent applicants from mass-submitting the same video to all of their target jobs.
There’s another type of internet junk that is more insidious than the obvious spam and AI filler content: It’s the ragebait and karma farming that happens on social media.
The weird part about ragebait content is that most of it doesn’t have an obvious monetary benefit to the posters. The goal appears to be generating as many upvotes and engagement as possible. There might be some secondary monetary goal such as building a following to sell to them later (Instagram, TikTok) or to build an account’s reputation to use it in spam operations later (Reddit, Twitter). However, I think a lot of people just post the content for the thrill of getting a lot of upvotes and seeing people worked up.
Try checking the front page of Reddit while logged out (to see the default subs). It’s a wild place that describes an alternate reality in which the worst possible outcomes are the norm. During COVID, everyone was going to die. After the vaccine, the talk was about everyone getting evicted and the “homeless wave” that was coming. Now that inflation is a topic, there are daily posts about how masses of people are going to starve to death because nobody can afford food.
A couple days ago one of the top posts on Reddit was a photo of 30 pills and nothing more than a title claiming it costs $12,000. Thousands of angry comments followed, demanding riots in the street and revolutions and talking about people dying. Buried in the comments was a single thread where someone actually looked up the price of the pills: It was $34, not $12,000. The manufacturer also has a program where people who can’t afford it can get the pills for free. It didn’t stop the post from staying on top Reddit for a day and convincing countless numbers of people of a complete lie.
This content seems to appeal to people who enjoy being outraged and don’t mind an exaggerated lie as long as it is in furtherance of a narrative they believe. People will look at that outrage thread and say it’s okay because America does have a drug price problem even if that story was a complete fabrication. This is the root problem: Too many people like the ragebait, the karma farming posts, and soon, the infinite AI generated content coming to feed them exactly the outrage they want to read every day.
One variant of ragebait I really hate is posters pretending to be stupid or confusing in order to get people to correct them, make fun of them, fight in the comments, ask questions etc, all because more comments equals more exposure.
For example. I literally just saw a video on my feed of a professional barber trying to fix a balding person’s haircut, by spray painting his scalp. The barber was acting serious, pretending to show off his skills, but the real content was how ridiculously awful it looked, and this accounted for 99% of the (many) comments. He knew what he was doing.
I’ve heard people call it “engagement bait”, since the goal seems to be to get people to comment or engage with the post, since that gives them a higher profile in the algorithm’s evaluation.
Part of me thinks “ragebait” is a systematic effort by enemy nations to undermine nationality in the US, and drive up discontent. A sort of psy-ops program to break down the mood and will of an entire country by constantly sowing seeds of discontent into its population.
I’m sure there’s prior art to this kind of thing but it just feels like a hunch at this point.
I'm sure that's some of it, but a lot of it is easily understood as organic. Bait gets attention. For some people that's enough, but plenty of ragebait distributors are monetizing it.
> This is the root problem: Too many people like the ragebait, the karma farming posts, and soon, the infinite AI generated content coming to feed them exactly the outrage they want to read every day.
This is a big part of the issue, and it's happening in non-social media contexts too. My parents use no social media, but they're constantly afraid and angry about things that often don't exist or aren't real issues.
I've noticed a lot of content lately that purposely farms engagement through dark patterns. For instance: getting a detail wrong or poor spelling/grammar almost always draws a lot of responses to dunk on the original creator.
When platforms like Twitter and TikTok monetize engagement you realize that these can often be intentional. Once you recognize the patterns it becomes very hard to unsee.
>> Amazon has become a marketplace for AI-produced tomes that are being passed off as having been written by humans, with travel books among the popular categories for fake work.
Yet another source of horribleness from Amazon's book department. I recently wanted to buy a softcover copy of Lovecraft's collected works. I know it's out of copyright but I want a book I can hold. What came was junk, poorly scanned to the point of having extra page numbers stolen from the original. This is what Amazon does with books. No matter how much or little you pay, the delivered book will somehow not be what you wanted.
I had a similar experience. My only copy of the "The Dirt" the autobiography of Motley Crue. It was a hard cover that my ex pilfered from me when we broke up. I went on Amazon to get it replaced. Ordered a soft cover because it was significantly less expensive.
What came was exactly the same thing. I think the cover was legit, but the contents were so poorly photocopied, you could see the shadows on the corners of many of the pages where they were copied. Many of the pages were just flat out crooked to boot. It was comical the seller passed this off as legit.
It was the last straw for me ordering things off of Amazon. Now I try and order stuff directly from the manufacturer so I know for sure it's legit. I'm now comfortable paying more for stuff knowing that's it legit than rolling the dice on Amazon and getting a fake or counterfeit item.
On the other hand, some AI generated content can be pretty good, informative, and useful. This is in the hands of its prompt's creator and in the quality of the model. It has to be done with care to ensure quality, avoiding falsities and hallucinations.
Part of my personal definition of slop is that it's unreviewed.
If someone uses ChatGPT to help them write and iterate on a well researched essay about a topic and verifies that the details are correct, I don't see that as slop.
People with English as a second language are getting a huge boost from these tools. That's great!
You are presupposing that it's not possible for GPT to write a well researched essay, which shows a limited understanding of how to use it well for research. For targeted essays, the audience is a single person, and so there is no reason for someone to iterate on GPT's work; it is sufficient to review it.
It is an uncontroversial fact that LLMs make mistakes.
If a human being has reviewed the output of GPT and is willing to stake their personal reputation on that content being "correct", then I'm happy to read it.
If nobody has done that then it's slop and a waste of my time.
(Unless of course I requested that information from GPT myself, in which case it's on me to review it before sharing it with others.)
> If someone uses ChatGPT to help them write and iterate on a well researched essay about a topic and verifies that the details are correct, I don't see that as slop.
Where is the presupposition? The parent comment seems to match your sentiment.
The presupposition is that one has to write the essay or iterate upon it (using GPT). It is that GPT won't do it all for them. With a sufficiently developed workflow, I believe GPT can do it all. I guess we were thinking of different applications.
Yes, a human always has to review the output, but not necessarily to micromanage or edit it further, sometimes only to approve or reject the claims.