Hacker Newsnew | past | comments | ask | show | jobs | submit | rspeer's commentslogin

There are healthy recommender systems, like Spotify.

YouTube is a _disastrously_ unhealthy recommender system, and they've let it go completely out of control.


Spotify's recommendation system is dealing mostly with artists that have recording contracts and professional production- their problem shouldn't be compared to YouTube's which has to deal with a mix of professional, semi-pro, and amateur created content. Also there's more of a "freshness" aspect to a lot of YT videos that isn't quite the same as what Spotify has to deal with (pop songs are usually good for a few months, but many vlogs can be stale after a week). Not only that, but many channels have a mix of content, some that goes stale quickly and some that is still relevant after many months- how does a recommendation engine figure that out?

It's better to compare Spotify's recommendations to Netflix's recommendations, which also deals with mostly professional content. Those two systems have comparable performance in my opinion.


Why the content exists is also important. People create video specifically for Youtube. Very few people create music just to host it on Spotify. This results in the the recommendation algorithm and all its quirks have a much bigger impact on the content of Youtube than Spotify. Also having that many people actively trying to game the recommendation algorithm can pervert that algorithm. That simply isn't a problem for sites like Spotify or Netflix.


>YouTube is a _disastrously_ unhealthy recommender system,

Can you explain with more details?

I use Youtube as a crowdsourced "MOOC"[0] and the algorithms usually recommended excellent followup videos for most topics.

(On the other hand, their attempt at matching "relevant" advertising to the video is often terrible. (E.g. Sephora makeup videos for women shown to male-dominated audience of audiophile gear.) Leaving aside the weird ads, the algorithm works very well for educational vids that interests me.)

[0] https://en.wikipedia.org/wiki/Massive_open_online_course


Yes. Elsagate is an example - the creepy computer-generated violent and disturbing videos that eventually follow children's content - or the fact that just about every gaming-related video has a recommendation for an far-right rant against feminism or a Ben Shapiro screaming segment. There's also the Amazon problem - where everything related to the thing you watched once out of curiosity follows you everywhere around the site.


>Elsagate is an example,

Yes, I was aware of Elsagate.[0] I don't play games so didn't realize every gaming video ends up with unwanted far-right and Ben Shapiro videos.

I guess I should have clarified my question. I thought gp's "unhealthy" meant Youtube's algorithm was bad for somebody like me that views mainstream non-controversial videos. (Analogy might be gp (rspeer) warning me that abestos and lead paint is actually cancerous but public doesn't know it.)

[0] https://news.ycombinator.com/item?id=20090157


> I don't play games so didn't realize every gaming video ends up with unwanted far-right and Ben Shapiro videos.

They don't. That's confirmation bias at work.


It's not 100%, but I'd consider "video games" => "Ben Shapiro" to be a pretty awful recommendation system, regardless of the reasoning behind it. As far as I know, the group "video gamers" doesn't have a political lean in either direction.

I've definitely seen this with comics. I watched a few videos criticizing Avengers: Infinity War, and now I see mostly Ben Shapiro recs. It makes no sense. I never have (and never plan to) seek out political content on YouTube.


I watch a number of gaming videos and have never had a far-right video recommended. Don't know who Ben Shapiro is.

It could be the type of games involved, since I usually watch strategy, 4x, city-building, and military sims. I usually get history-channel documentaries or "here's how urban planning works in the real world" videos recommended, which suits me fine. Somebody whose gaming preferences involve killing Nazis in a WW2-era FPS might be more likely to get videos that have neo-Nazis suggesting we kill people.


Some of the child comments of your thread mention the nazi problem.


But that child comment didn't link Nazis to normal "video games". I assumed he just meant some folks (e.g. "1.8%" of web surfers) with the predilection for far-right videos would get more Nazi recommendations. Well yes, I would have expected the algorithm to feed more of what they seemed to like.

I do not see any Nazi far-right videos in 1.8% of my recommendations ever.


Isn't that an inevitable side effect of collaborative filtering? If companies could do content based-recommendation, wouldn't they? Until purely content based recommendations are possible, wisdom of the crowds via collaborative filtering will lump together videos that are about different things but watched by similar viewers.


Spotify simply does not have the content over which an algorithm could loose control.


Spotify has 40M tracks total. On YouTube, more than 5B videos are watched by users every day. Different scales of problem demand different solutions.


I don't know what the comment you are replying to meant, I interpreted it to mean the algo takes you down a rabbit hole to darker content, however for me I miss the days when it actually recommended relevant videos, similar to the one I was watching.

My entire sidebar is now just a random assortment of irrelevant interests. For instance I wanted to learn to play a denser piano chord, I learned it ages ago but I still get like 20 videos that explain how to add extensions to a 7 chord, even if I'm watching a video on the F-35 fighter pilot.


I completely disagree, my children have a wonderful time following the recommended videos that youtube provides. I'm interested to hear your reasoning on why it is "disastrous".


How is Spotify's different from Youtube?


I'm pretty sure all content on Spotify gets manually curated first, so abusive tagging doesn't happen, and some of the worst content simply doesn't get uploaded at all. Spotify also doesn't try to be a news site, so they can afford to have a couple week's lag between uploading a song and having it show up in people's recommendation feed.


More selective recommendation, all-subscriber environment.


I disagree in some sense. I personally have found the recommending system on YouTube pretty good for the main page of the site. The thing that bugs me is the recommended bar right (or bottom right) of the videos, which can be really annoying and infested with clickbait etc.


That's not very much.


Serving advertisements is certainly very bandwidth and energy intensive at least when talking about news/media sites.

The majority of the data that is getting pushed around during a page load is ad delivery or tracking code to inform the ad delivery. That is a lot of servers, load balancers etc etc working hard pushing around ad code. The actual text of a news article is a tiny part of the whole workload.

Once the site is being loaded by your browser (ie you have downloaded all the JS and your machine has parsed a few MB of ad/tracking code), depending on the site's ad provider(s) there is potentially a real time auction going on for every ad slot on every page load. Multiple ad networks hold automated auctions within themselves which are rolled up into an auction between the winner of each of those auctions. There are literally banks of big beefy servers scattered around the world bidding on ad slots 24/7.

A breath-taking amount of energy and bandwidth get used up to display an annoying 'sweater for dogs' ad next to a news article.


That's literally all Facebook and Google do.


It absolutely and obviously is not.

They do significantly more than this, even if you wanted to (incorrectly) classify all of their products like this, they actually do more than that to entice people in.


It seems remarkable to me that anyone can honestly hold the opinion that Google do not do anything other than serve adverts or send marketing emails.

They have a search engine. Does that really need actually pointing out?


Maybe they meant human energy (and potential).


Well, it's not true. Bitcoin has increased the demand for coal power in China. Bitcoin miners flocked to New York State for cheap natural gas power.

Green energy is not necessarily the cheapest form of energy. If it were, it wouldn't even be _hard_ to stop burning fossil fuels and prevent climate change.


Here's a NYT article about Bitcoin miners moving to New York State last year, but not for natural gas -- instead to a depressed ex-industrial town that has overprovisioned hydroelectric, as you'd expect:

https://www.nytimes.com/2018/09/19/nyregion/bitcoin-mining-n...

When did miners go to NYS for natural gas power? Are they still there, or did they get undercut by hydro from somewhere else in the world?


Green energy isn't the cheapest source everywhere in the world, but Bitcoin mining has no geographic restriction. Instead of picking a location and then choosing an energy source once there, as most of us do, they can first pick a green energy source and then choose a location anywhere in the world that provides it.

Thus, there's no analogy to the challenge the rest of the world faces in moving away from fossil fuels.


What’s your source? From what I’ve heard bitcoin mining farms in China are located in regions taking pretty much all energy from hydro.


The person you're replying to is clearly aware of search ranking algorithms. You should try looking into the HITS algorithm they mentioned for some additional context.


To be fair, I'm an ops guy, not a search engineer. ;) It's valid to say AV might not have implemented the basic concepts as well and I don't want to devalue Google's innovation. It just annoys me when people assume PageRank was a unicorn and nobody else was doing anything similar.


As much as I value PageRank, it annoys me when people assume it was a novel idea.

It had been applied decades before through scientific paper references, as a measure to improve on the "number of references" metric, which is more easily gamed. References are more rarely circular (only same time in-preparations can form cycles, unlike web pages). I was sitting in a class about stationary processes in 1996 when the lecturer mentioned this (already old and well known at the time) use case as motivation.

Whatever AV implemented at the time, it was not on par.


Well, HITS is applied after you’ve already selected a subset, at response time; so, if you didn’t select s good subset (and AV often didn’t) then picking the most promising out of that subset is not as helpful.

Pagerank is essentially a “universal authority score” (in the HITS terminology), and it worked well because at the tine you didn’t have pages that were authority for one subject and spam for another. You do now - which is why pagerank is now one signal out of 200, even though it was sufficient on its own 20 years ago.


Most of the projects you're thinking of probably never even generated high-resolution images. Until recently, the actual output layer for most of these systems would be something like a 256x256 array of pixels.


Yes. The arrest was for harassment and doxxing, not for misgendering.

paganel, when you say "Sorry for the Daily Mail link" it sounds like you know you're citing a source that will be extremely untrustworthy on this topic, and then did it anyway.


This is a revisionist and false narrative, and I believe it may be inspired by the "just world fallacy": you may be inclined to believe the world couldn't actually be unfair to women, so you search for another reason why their contributions would actually have been less important.

Your narrative is historically incorrect. Women invented many of the things that made software engineering _less_ tedious, such as debuggers (Betty Holberton, 1945), subroutines (Kay McNulty, circa 1947), assemblers (Kathleen Booth, 1947), linkers (Grace Hopper, 1952), and compilers (Grace Hopper, 1954). Programmers have always strived to make their job easier. If you don't see these inventions as creative effort, I don't know how to help you.

Even the parts of programming that were tedious were still important. If you saw someone tediously programming in assembly code today, you might celebrate the effort.

The flowcharts that these women were handed by their managers were not "program logic". There were many difficult details to work out about how to implement a program given the limited computing power of the time. You would not describe a manager today who draws a flowchart as having "written a program".


You jump to conclusions and project your own anger/frustrations that have absolutely nothing to do with me. First of all I totally agree with you that to women it was much harder to succeed in those times, there's no need to explain that to me. Second I never even mentioned gender issues anywhere, I was talking about the difference between what was called "programming" and what was "coding" back in those days. And of course women invented a lot of great stuff, but as you said yourself it was programming. And the coding part on the other hand was boring as hell, in order to write and run a program you had to do a lot of manual labour. It's just not comparable to anything today. You mention assembler, but imagine doing asm using only pen & paper, and then you give that program to someone else to turn it into holes on punched cards. And then you had to reserve some time to run that code on mainframe (where operator would run it for you and hand you back the results) and there was a usually a long waiting list for that, so if you've made any mistakes it would take days to get a chance to re-run it again. It was very different than coding today, involved more steps and more people.


No, they weren't doing "tedious data entry", they were writing the entire program. And I'd say it would be less boring than programming in 2019, which is easier and pre-existing libraries solve most of it for us.

You are doing exactly the thing the article describes.


I've certainly encountered this issue.

When I release ConceptNet, I would like to follow SemVer, but it kind of has to be in a way that the second component of the version is the major version, because there are bigger things that happen in a multi-decade project than just API changes.

The project is ConceptNet 5. It is the fifth project named ConceptNet. What it has in common with ConceptNet 4 is some of the data and the spirit of the problem to be solved; the design is quite different.

It's on the sixth major version after the initial release, so I call it ConceptNet 5.6. The current version is 5.6.4. If I wanted to use tools that expect SemVer, I'd have to call it something like "ConceptNet 5 v6.4". I find that confusing, and I think other people would too.


And it took me 3 months, $180, and being ordered to publish a notice in the local newspaper who was really uninterested in following through and then charged me another $170 for it.

It's different everywhere and it's usually not easy.


Did you publish your name change in the Diarrhea Times?

https://sludgefeed.com/full-first-issue-diarrhea-times-natha...


okay what

And if I had gotten to choose the newspaper, I would have chosen one that is more friendly and more representative of the local community than the glorified blog operated out of another state that I was ordered to use


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: