More

jamesblonde · 2026-02-19T06:13:43 1771481623

They need to fix the addresses. In Stockholm, all of the companies are placed in the old town. At Hopsworks, we are in Sodermalm (hipster) - we are not old school money.

mikae1 · 2026-02-19T06:20:49 1771482049

Was just looking at the map thinking: have the all moved to Gamla stan? :)

jamesblonde · 2026-02-06T13:27:26 1770384446

I gave a talk at PyData Berlin on how to build your own TikTok recommendation algorithm. The TikTok personalized recommendation engine is the world's most valuable AI. It's TikTok's differentiation. It updates recommendations within 1 second of you clicking - at human perceivable latency. If your AI recommender has poor feature freshness, it will be perceived as slow, not intelligent - no matter how good the recommendations are.

TikTok's recommender is partly built on European Technology (Apache Flink for real-time feature computation), along with Kafka, and distributed model training infrastructure. The Monolith paper is misleading that the 'online training' is key. It is not. It is that your clicks are made available as features for predicitons in less than 1 second. You need a per-event stream processing architecture for this (like Flink - Feldera would be my modern choice as an incremental streaming engine).

* https://www.youtube.com/watch?v=skZ1HcF7AsM

* Monolith paper - https://arxiv.org/pdf/2209.07663

eddd-ddde · 2026-02-06T19:53:44 1770407624

I have to say, it is _extremely_ impressive when a tiktok I watched reminds me of some other tiktok, so I go and search for a very loose description of the tiktok, and the first result is 95% of the time what I wanted to find.

I don't think any single other platform has as good a search feature as TikTok does.

tridentboy · 2026-02-06T20:12:37 1770408757

oh wow, you're really lucky. around my friend groups who use tiktok, the main complaint is how bad the search is. unfortunately for us, getting a specific video is almost impossible =(

nik_0_0 · 2026-02-10T03:16:22 1770693382

Thats super interesting (I deleted Tiktok because it was too addicting!), but this is a common complaint about Instagram is that it feels impossible to find a reel based on keywords.

dmix · 2026-02-06T14:17:47 1770387467

I noticed Youtube shorts also seems to update the feed based on how long the last video you watched. If you're scrolling quickly then stop to watch a dog video long enough the next one is likely to be another dog video.

kgeist · 2026-02-07T09:55:08 1770458108

It creates a weird feedback loop: after I watch video A, it recommends a similar video B, and if I make the mistake of watching that too, it then recommends video C on the same topic. Suddenly my feed is nothing but Stranger Things shorts for two whole days (literally not a single video about anything else). Skipping or disliking didn't help, then somehow it went back to normal after two days.

randysalami · 2026-02-06T15:31:26 1770391886

I’ve noticed the same thing and this creates such a negative user experience. Every short is a reaction test and if I fail, I get slop. Makes the whole experience very jarring (for better or for worse).

datsci_est_2015 · 2026-02-06T15:53:13 1770393193

For better or worse with regards to my addiction, my subscriptions are all either science channels or high effort / high production comedy skits (e.g. DropoutTV). I still get slop, but I never subscribe and it mostly remains background noise

AreShoesFeet000 · 2026-02-06T16:27:47 1770395267

That’s the point though. It may seem as if you’re not in control when scrolling, but you can adjust your behavior to get the content you’re looking for almost intuitively. That’s actually something good in my honest opinion.

Jensson · 2026-02-06T16:31:50 1770395510

Why is it good that you need self control to not get slop? Its much better if you can just turn that off and relax rather than having to stay alert to avoid certain content that it tries to trick you to serve you more slop.

Distancing yourself from temptations is an effective and proven way to get rid of addictions, the programs constantly trying to get you to relapse is not a good feature. Like imagine a fridge that constantly puts in beer, that would be very bad for alcoholics and people would just say "just don't drink the beer?" even though this is a real problem with an easy fix.

james_marks · 2026-02-06T17:13:02 1770397982

Basically, I want to set boundaries in a healthy frame of mind, and have that default respected when my self control is lower because I’m tired, depressed, bored, etc.

“The algorithm” of social media is the opposite.

AreShoesFeet000 · 2026-02-06T17:34:31 1770399271

I think your reply has me convinced. You really can’t expect to have such self control all of the time. Damn.

AreShoesFeet000 · 2026-02-06T16:40:19 1770396019

It’s because content curation is inherently impossible to reach the same level of relevance as direct feedback from user behavior. You mix in all kinds of biases, commercial interests, ideology of the curator, etc, and you inevitably get irrelevant slop. The algorithm puts you in control a little bit more.

Jensson · 2026-02-06T16:44:21 1770396261

> The algorithm puts you in control a little bit more.

Why not let you choose to get a less addictive algorithm? Older algorithms were less addictive, so its not at all impossible to do this, many users would want this.

kylecazar · 2026-02-06T17:12:24 1770397944

They're optimizing for time spent on the platform.

Jensson · 2026-02-06T17:19:50 1770398390

And that is why these algorithms needs to be regulated. People don't want to pick the algorithm that makes them spend the most time possible on their phones, many would want an algorithm that optimizes for quality rather than quantity on the app so they get more time to do other things. But corporations doesn't want to provide that because they don't earn anything from it.

direwolf20 · 2026-02-07T12:54:29 1770468869

I have YouTube Premium. They should be doing the opposite. Getting me off the platform as quickly as possible so they get to keep a bigger cut of my fixed payment.

AreShoesFeet000 · 2026-02-06T17:08:40 1770397720

I just don’t think that the addiction is exclusively due to the algorithm. There’s really a lack of affordable varied options for learning trade and entertainment. We say in Portuguese: You shouldn’t throw the baby away along with the water you used to bathe.

Jensson · 2026-02-06T17:21:59 1770398519

I don't see any harm that could come from saying "a less addictive algorithm needs to be available to users"? For example, lets say there is an option to only recommend videos from channels you subscribe to, that would be much less addictive, why isn't that an option? A regulation that forces these companies to add such a feature would only make the world a better place.

fsckboy · 2026-02-06T20:43:57 1770410637

>I don't see any harm that could come from saying "a less addictive algorithm needs to be available to users"?

consider air travel in the present day. ticketing at essentially all airlines breaks down as: premium tickets that are dramatically expensive but offer comfortable seats, and economy tickets that are cramped and seem to impose new indignities every new season. what could be the harm from legislation that would change that menu?

the harm would be fewer people able to travel, fewer young people taking their first trip to experiencing the other side of the world, fewer families visiting grandma, etc.

As much as people hate the air travel experience, the tickets get snapped up, and most of them strictly on the basis of price, and next most taking into account nonstops. This gives us a gauge as to how much people hate air travel: they don't.

this doesn't mean airlines should have no regulation, it doesn't mean monopoly practices are not harmful to happiness, it doesn't mean that addictions don't drive people to make bad choices, it doesn't mean a lot of things.

I'm just trying to get you to see that subtle but significant harm to human thriving can easily come from regulations.

_DeadFred_ · 2026-02-07T01:57:24 1770429444

'we gotta keep lead in gas'

AreShoesFeet000 · 2026-02-06T17:24:38 1770398678

I agree, but what would be the actual mechanism that would allow that? I believe we’re out of ideas. TikTok’s crime was just be firmly successful because of good engineering. There’s no evil sauce apart from promotional content and occasional manipulation, which has nothing to do with the algorithm per se.

And about whitelisting, I honestly don’t think you’re comparing apples to apples. The point of the algorithm is dynamically recommending new content. It’s about discovery.

Jensson · 2026-02-06T17:30:48 1770399048

> I agree, but what would be the actual mechanism that would allow that?

Governments saying "if you are a social content platform with more than XX million users you have to provide these options on recommendation algorithms: X Y Z". It is that easy.

> And about whitelisting, I honestly don’t think you’re comparing apples to apples. The point of the algorithm is dynamically recommending new content. It’s about discovery.

And some people want to turn off that pushed discovery and just get recommended videos from a set of channels that they subscribed to. They still want to watch some tiktok videos, they just don't want the algorithm to try to push bad content on them.

You are right that you can't avoid such algorithm when searching for new content, but I don't see why it has to be there in content it pushes onto you without you asking for new content.

AreShoesFeet000 · 2026-02-06T17:46:15 1770399975

Fair enough. I’m not really a fan of regulation. The capitalist State is a total mess, but I really think we should try your idea.

cwillu · 2026-02-06T19:36:57 1770406617

We're allowed to create laws to avoid a result we don't like, regardless of how many good intentions paved the road that brought us to that result.

_DeadFred_ · 2026-02-07T01:58:23 1770429503

Leaded gasoline was great engineering as well. Doesn't mean we continued to allow it to poison people.

https://news.ycombinator.com/item?id=46865275

AreShoesFeet000 · 2026-02-07T08:36:16 1770453376

Given the amount of vehicular accidents I think we haven’t even gone far enough and banned cars altogether.

Wyverald · 2026-02-06T22:34:38 1770417278

For the record, almost the exact same expression exists in English: https://en.wikipedia.org/wiki/Don%27t_throw_the_baby_out_wit...

Forgeties79 · 2026-02-06T16:32:16 1770395536

I don’t agree tbh. This is part of how people wind up down extremist rabbit holes. If you’re just lazily scrolling it can easily trap you in its gravity well.

AreShoesFeet000 · 2026-02-06T16:38:12 1770395892

But you can get into extremist rabbit holes independently of control surface. Remember 4chan? Dangerous content is a matter of moderation regardless of interfacing.

boelboel · 2026-02-06T23:35:26 1770420926

4chan has a lot less extremism than people imagine, rspecially compared to platforms like Instagram or Facebook. It's mostly concentrated on certain boards. The reputation of being extremist did more 'in favour' of its extremism than the original userbase and design ever did.

Forgeties79 · 2026-02-07T01:56:04 1770429364

4chan is only outdone by 8chan. “It’s only concentrated on certain boards” is the same lame excuse Reddit used to ignore /r/thedonald and now /r/conservative.

boelboel · 2026-02-07T05:39:50 1770442790

4chan doesn't use algorithms to push users to certain boards afaik, makes it better than the others in its design. I'm not arguing 4chan is great but it's not nearly as impactful as Facebook, Twitter or TikTok in creating extremism.

Forgeties79 · 2026-02-07T13:30:26 1770471026

So you believe 4chan (and its cousin boards) didn’t/dont foster extremism?

chasing0entropy · 2026-02-07T15:40:11 1770478811

Facebook and Twitter are far worse sources of extremism. There are entire groups dedicated to genetic comparisons between races, 'who would you do' groups that do nothing but photos of young women in bikinis farmed FROM facebook/ig.

4chan is where you go too far. 4chan users typically don't foster extremism, they are the extreme. They don't post pictures of young women, they post addresses and walkthroughs of their apartments.

Forgeties79 · 2026-02-07T16:07:21 1770480441

so it’s a place where people go when they’re already radicalized but it doesn’t radicalize anybody on it? Is that the argument?

boelboel · 2026-02-10T20:57:56 1770757076

Yes, I feel like it's far less harmful than the other sites for this reason. These bad parts of 4chan aren't the majority of the site either, a large minority maybe, but the site in general is much smaller. Users are also attracted to the image of 'extremism', 4chan in the far past didn't have this as its main audience of newcomers in its early stages.

It's easy to control for governments compared to facebook/reddit/... because it's just some boards, way better than massive amounts of posts creating a personal zone for everyone.

krapp · 2026-02-07T12:04:34 1770465874

>I'm not arguing 4chan is great but it's not nearly as impactful as Facebook, Twitter or TikTok in creating extremism.

4chan has /pol/. 4chan inspired Gamergate, Pizzagate, QAnon and numerous incidents of extremist violence. Those other platforms mostly just spread and accelerate the toxic culture that originated on 4chan.

boelboel · 2026-02-10T21:10:45 1770757845

I'm not sure if most of 4chan was actually so on board with the whole gamergate thing and all the things which followed. pre-/pol/ 4chan was a whole different thing. It was outsiders joining 4chan which did most of the posting, twitter and facebook were the ones which allowed this to happen.

Internet starting with a 1000 4chans wouldn't create what we have today (you'll just get lots of small fringe groups), internet starting with a 1000 facebooks/twitters/... will always end in extremism of a big portion of the population.

direwolf20 · 2026-02-07T13:01:38 1770469298

And — this is really shocking — Jeffrey Epstein caused /pol/ to exist, which makes him indirectly responsible for almost all stupid internet politics of the last decade.

https://www.msn.com/en-us/news/technology/epstein-met-4chan-...

Forgeties79 · 2026-02-06T16:43:03 1770396183

4chan is nothing like TikTok, though yes I agree heavy moderation is necessary for both.

xp84 · 2026-02-06T16:32:02 1770395522

I try to react as “violently” as possible to any slop and low-quality crap (e.g. stupid “life hacks” purposely bad to ragebait the comments). On YouTube it’s called “Don’t recommend this channel” and on Facebook it’s multiple taps but you can “Hide All From…” Basically, I don’t trust that thumbs down is sufficient. It is of course silly, since there are no doubt millions of bad channels and I probably can’t mute them all.

MengerSponge · 2026-02-06T16:34:20 1770395660

They built a slop machine, not something tuned for positive UX.

https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...

HappMacDonald · 2026-02-07T00:49:22 1770425362

At the risk of going off on a tangent about that maxim; I feel like it's just misusing the word "purpose".

Maybe it would be cleaner to state that a system has no purpose (at least not until it is sentient), instead it has behaviors. Then one can observe that the purpose of the designers or maintainers of a system simply happens to be at odds (or as AI safety researchers would say, are "out of alignment with") the behavior of the system.

That all of course presupposes that one can accurately deduce the purposes of the designers/maintainers.. In the case of TikTok, I'd bet that we are all in agreement that their purpose is nothing more nor less than maximal value-extraction from people wishing to express themselves with videos multiplied against an audience of people who wish to view videos multiplied again against advertisers who want to insert propaganda into eyeballs.

direwolf20 · 2026-02-07T13:02:23 1770469343

If a system is not fulfilling its purpose, the system is changed. If the system is not changed, it is fulfilling its purpose.

pandemic_region · 2026-02-06T15:20:07 1770391207

I've been insta-skipping tennis video's for months now. Still getting Federer on a daily basis.

beAbU · 2026-02-06T18:17:08 1770401828

Facebook does the same. The longer I dwell on an image post, the more likely the next batch of posts would be similar

coliveira · 2026-02-06T18:43:44 1770403424

The right way to look at these networks is that people are being trained by the algorithm, not the other way around. The ultimate goal is to elicit behaviors in humans, normally to spend more time and spend more money in the platform, but also for other goals that may be designed by the owners of the network.

touristtam · 2026-02-06T20:04:48 1770408288

Is amazon using the same thing??? I can't count the number of times I am getting recommended the EXACT same type of product I just purchased.

beAbU · 2026-02-07T11:31:06 1770463866

On amazon.ie I'm convinced they are running only two ads, because all I'm ever seeing are ads for grime brushes and window squeegees. Literally nothing else.

BoxOfRain · 2026-02-06T18:14:40 1770401680

One of my gripes with youtube at the moment is that they break my adblock filters to remove shorts more often than they break the filters stopping the actual ads.

sidharthv · 2026-02-06T18:25:05 1770402305

I naively searched in the mobile app settings for a way to turn off shorts, before realising there will not be one.

ffsm8 · 2026-02-06T19:37:33 1770406653

You can't turn it off entirely, but if you keep using "show less shorts" from the 3 dot menu it eventually goes away, mostly...

direwolf20 · 2026-02-07T13:04:01 1770469441

It comes back. It acts like it executed shortiness+=1 every day, and "show fewer shorts" does shortiness-=10 or thereabouts. The shorts position on the home screen is based on this hidden shortiness variable. It always bubbles back to the top unless you keep pressing "show fewer shorts" whenever you see it.

ffsm8 · 2026-02-07T15:11:58 1770477118

Strange, for me it only comes back up if I get baited into clicking on any short (or have one linked to me)

notpushkin · 2026-02-07T05:51:39 1770443499

You can use NewPipe or YouTube ReVanced, or set up a “child” account and disable Shorts in the parental restrictions settings.

mghackerlady · 2026-02-06T20:35:23 1770410123

If you're on android it's better to just use revanced or something

satiric · 2026-02-06T22:33:56 1770417236

I use an addon for firefox called "Hide shorts for Youtube™". Works just fine.

exq · 2026-02-07T13:06:31 1770469591

I use this add-on to redirect shorts to the regular video URL. I can still watch them but I don't get the infinite scroll and algorithm pigeon-holing.

https://addons.mozilla.org/en-US/android/addon/youtube-short...

rjh29 · 2026-02-07T09:55:54 1770458154

youtube's algorithm seems to be "oh you watched this video? now here's every other video by this creator, pretty much without a break, until you downvote it"

It never reliably gives me videos similar but not exactly the same, i.e. things I might be interested in.

sensanaty · 2026-02-07T10:29:09 1770460149

For me it's the same exact 5 videos on repeat, over and over and over again. I've gotten in a loop a lot of times, where it'll autoplay the same video I just watched, it's absolute madness

vjerancrnjak · 2026-02-06T15:09:00 1770390540

Flink is too slow for this.

If by features you mean tracking state per user, that stuff can be tracked without Flink insanely fast with Redis as well.

If you re saying they dont have to load data to update the state, I dont see how massive these states are to require inmemory updates, and if so, you could just do inmemory updates without Flink.

Similarly, any consumer will have to deal with batches of users and pipelining.

Flink is just a bottleneck.

If they actually use Flink for this, its not the moat.

btown · 2026-02-06T16:38:43 1770395923

Yea, the Monolith paper by Bytedance uses Flink but they only say it's in use for their B2B ecommerce optimization system. Maybe this is intentional ambiguity, but I'd believe that they wouldn't rely on something like Flink for their core TikTok infrastructure.

My hunch is we start to learn a lot more about the core internals as Oracle tries to market to B2B customers, as Oracle is wont to do!

vjerancrnjak · 2026-02-06T17:59:52 1770400792

Flink is not really a performance choice, it's bloat to throw software as fast as possible at problems. I don't think there's any benchmark demonstrating insane capabilities per machine. I definitely couldn't get it to any numbers I liked, given other stream processing / state processing engines that exist (if compute and inmemory state management is the goal). Pretty sure any pathway that touches RocksDB slows everything down to 1-10k events per second, if not less.

The problem of finding out which video is next, by immediately taking into account the recent user context (and other user context) is completely unrelated to what Flink does -- exactly-once state consistency, distributed checkpoints, recovery, event-time semantics, large keyed state. I would even say you don't want a solution to any of the problems Flink solves, you want to avoid having these problems.

lsuresh · 2026-02-06T17:35:17 1770399317

Thanks for the Feldera shoutout Jim.

For anyone else, if you want to try out Feldera and IVM for feature-engineering (it gives you perfect offline-online parity), you can start here: https://docs.feldera.com/use_cases/fraud_detection/

bobek · 2026-02-06T18:09:42 1770401382

It is not only recommender though. These guys [1] seem to be able to react pretty quickly and not to create addicts on the way ;(

[1] https://recombee.com

miohtama · 2026-02-06T15:13:03 1770390783

TikTok's differention is the userbase of all teenagers in the world.

wongarsu · 2026-02-06T16:38:34 1770395914

But go just one layer deeper to 'why is every teenager using Tiktok' and the primary answer once again becomes 'Tiktok's recommendation engine'

godshatter · 2026-02-06T20:43:17 1770410597

I'm not a TikTok user, but I'm assuming the recommendation engine is there to keep eyeballs on more ads for longer. Maybe we should be regulating how often and how many ads can be shown on social media, especially to teens and kids.

cloverich · 2026-02-06T21:24:45 1770413085

They are arguing its not the recommender that is unique it is the network effect.

wongarsu · 2026-02-06T21:53:17 1770414797

If it was only network effect, then how did TikTok grow in a space where Instagram and Youtube were already much bigger players? How did they gain that user base?

Network effect helps, but it only explains why they stay big, not how they got big

There really isn't that much making TikTok unique. Yes, their app is well designed. Yes, stitches and video replies make for great social/parasocial features because creators are actually interacting with each other and the community, almost like tumblr. But in my opinion those are reasons number three and two why TikTok is successful. Their recommendation algorithm is number one, by a wide margin.

miohtama · 2026-02-07T01:49:09 1770428949

Because back in a day, Instagram and YouTube were shitty products and by some metrics still are

expedition32 · 2026-02-06T18:39:53 1770403193

No the primary answer is "teenagers do what other teenagers do". Remember we are advanced apes no more no less.

There is this curious word "influencer" which everyone uses but few ever think about what it really means.

direwolf20 · 2026-02-07T13:05:33 1770469533

That's not an answer, that's a thought–terminating cliché.

AlienRobot · 2026-02-06T15:39:56 1770392396

It also provides different opportunities for growth compared to other social media. A video that gets over half a million views on TikTok may not get 5 thousand on Youtube, or even 10 views on Instagram or Facebook.

bigfishrunning · 2026-02-06T21:19:10 1770412750

Isn't the inverse true though? it's not as if nobody's watching youtube, it's just that different videos are popular there.

dehue · 2026-02-07T10:59:08 1770461948

It's not just different videos, Tiktok is much better at recommending videos by very small creators and people with no followers. On Instagram or Facebook if you don't already have a large following you most likely wont get any views at all no matter how well your video matches the platform. YouTube often pushes big creators that already made it big while Tiktok allows me to discover new and niche ones.

notyourwork · 2026-02-06T16:38:25 1770395905

That didn’t by accident though.

3abiton · 2026-02-06T18:23:22 1770402202

It's interesting to how they found out the "lifetime" of features is a feature by itself. Meta features is real.

not_ai · 2026-02-06T21:32:58 1770413578

I’m happy to see that Flink is in this stack, I wish that Pulsar was as well instead of Kafka.

permo-w · 2026-02-07T03:33:10 1770435190

I'm sorry to point out the obvious here, but who is going to perceive their recommended feed as slow or unfresh if it doesn't learn from exactly the last video you clicked on within 1 second? The bar simply is not that high. The special sauce of TikTok is how it chooses the videos, not the speed it does it at. I'm sure the speed helps to give it that "spookily intelligent" feeling, but that's a cherry on the recommendation cake, a cake which is already twice as good as the nearest competitor. I'm sure your talk goes deeper than this, but if this is the main focus, then you've missed the point.

owenversteeg · 2026-02-07T18:37:20 1770489440

I partly agree and disagree.

Speed completely changes the game in a few ways. The first is identifying interests. Imagine every possible interest in a tree structure. Let's say you're into kumiko. There are so many levels of the tree to traverse to find kumiko; perhaps Skilled crafts -> Woodworking -> Japanese -> Construction without use of fasteners -> Panels and decorative elements -> Kumiko. The more iterations you can get through, the better you can match people's interests. If someone has 10 interests and each one requires many questions to determine, it can take forever to find exact interests with a system that only narrows down your interests every X videos vs. after each video.

The second is matching current moods. Let's say you just broke up with your girlfriend, or your pet fish died, or you're on vacation in Spain. A rapidly-updating system can capture those trends and get right to the heart of them in time for them to matter. A slow system might only get through a few iterations and capture a vague interest in Spain; a fast-updating one can get through countless iterations of guessing. Spain? What city? Tourist or moving there? What type of tourist? Foodie? What type of food? How fancy? Bam, you're watching the perfect video about an upscale seafood restaurant in Barcelona.

The third is type and flavor of content. Even inside of a small niche you will find many flavors of content. Super-short or long form, fast paced or slow, funny or serious, intellectual, irreverent, political leanings, background music, et cetera. Maybe you like slow long-form woodworking content but like fast-paced travel guides. Maybe you hate background music except when it's in skateboarding videos. To determine this requires an incredible amount of "questioning" of the user.

Now, of course, an algorithm that updates once daily can also make inferences about your interests and preferences. It can certainly learn, with enough time, what you are into and how you like to consume it. But the key thing is that these inferences only enable _predetermined_ changes. Imagine you are a human showing someone TikToks. Imagine that you can ask them any questions about their preferences right as they watch a video. You may not ask a question after every video, but you will ask countless questions over the hours of scrolling that day, and you will get good data. Now imagine a new restriction: you must decide your questions once a day in advance. You will manage far fewer questions; and to follow up on them you must wait yet another day.

Now, why do I partly agree? Well, I don't think speed is everything; I think TikTok has another sort of je ne sais quoi to it. I think it has a unique culture and community. It has a better UI and better features than Instagram. It has a young and cool reputation, far from the Millennial taint of Instagram or Facebook. And I suspect that they are good at identifying _who_ you are and acting on that information. But in my eyes, the speed could very well be the most important part of the puzzle.

ryanjshaw · 2026-02-06T14:20:48 1770387648

Great insight. Any thoughts on RisingWave?

jamesblonde · 2026-02-06T19:19:25 1770405565

That, too, and materialize. Feldera is my favourite, though.

SpaceManNabs · 2026-02-06T22:35:34 1770417334

apache flink is so good. i think netflix used it heavily in 2018. not sure about now.

cactusplant7374 · 2026-02-06T19:25:56 1770405956

I thought was secret information. How long as it been publicly known?

permo-w · 2026-02-07T03:35:37 1770435337

The secret is how it chooses videos to recommend, not the tech stack it uses and how fast it is

cactusplant7374 · 2026-02-08T22:17:49 1770589069

I would assume it's time spent on video and they start to build a profile of which users like what kind of content. X liked video 934934 so Y probably also likes that kind of video. Group people in buckets.

permo-w · 2026-02-09T05:02:06 1770613326

I'm sure this is part of it, but I suspect it goes deeper than that. I'd guess they probably have some kind automated categorisation algorithm that can extract features from the videos

jamesblonde · 2026-01-20T04:50:26 1768884626

it's whitewashing. The US Cloud Act means they can still take the data.

jamesblonde · 2026-01-19T06:55:06 1768805706

I got turned off in the first paragraph with the misuse of the term "back pressure". "back pressure" is a term from data engineering to specifically indicate a feedback signal that indicates a service is overloaded and that clients should adapt their behavior.

Backpressure != feedback (the more general term). And in the agentic world, we use the term 'context' to describe information used to help LLMs make decisions, where the context data is not part of the LLM's training data. Then, we have verifiable tasks (what he is really talking about), where RL is used in post-training in a harness environment to use feedback signals to learn about type systems, programming language syntax/semantics, etc.

epolanski · 2026-01-19T10:40:03 1768819203

I find your kind of comment pointless.

The term back pressure actually comes from mechanical engineering in the context of steam engines.

It first appeared in a dictionary 160 years ago.

Words are just words. Mathematicians very well understand that words mean nothing, what matters are definitions and the author provides one.

E.g. natural numbers may or may not contain the number 0, but that's irrelevant, because what mathematicians care for are definitions, so they will state that natural numbers are a given a set of positive whole numbers (including or not the number 0) and avoid arguing about labels. You can call them funky numbers or neet numbers, doesn't matter.

Same applies here. Your comment is pointless because the author does provide a definition for back pressure in the context of his blog post and what matters is discussing the concept he labels in the context of LLMs.

paradite · 2026-01-19T07:21:14 1768807274

If you want to pedantic:

Context is also a misnomer, where in fact it's just a part of prompt.

Prompt itself is also a misnomer, where in fact it's just part of model input.

Model input is also a misnomer, in fact it's just first input token + prefill for model output to generate more output.

Harness is also a misnomer, where it's just scaffold / tools around the model input/output.

patates · 2026-01-19T07:14:12 1768806852

We all live in our own various small circles, in which many terms get misused. Isomorphic in front end circle means something completely different than any other use, for example. This is how languages evolve.

I'm not trying to discount any attempt to correct people, especially when it gets confusing (like here, I was also confused honestly), but we could formulate it nicer IMHO.

SubiculumCode · 2026-01-19T07:16:32 1768806992

It is perhaps more generally known in the plumbing sense of pressure causing resistance to the desired direction of flow, but yeah, a poor word choice...at least it isn't AI written though.

oofbey · 2026-01-19T07:16:01 1768806961

Ironic that you nitpick the author’s word choice of “back pressure” and then completely misuse the term RL in your complaint.

ghuntley · 2026-01-19T08:22:46 1768810966

the back pressure terminology comes from me. essentially it’s the wheel - you need to add backpressure to the agentic flywheel.

see https://ghuntley.com/pressure

i have the pleasure to work with moss and he came up with a way to explain what is in my head with ease.

atoav · 2026-01-19T07:03:27 1768806207

Well it does sound technical.

jamesblonde · 2026-01-15T11:04:05 1768475045

If they are exploding categorical variables using OHE and storing the columns - that is the wrong thing to do. You should only ever store untransformed feature data in tables. You apply the feature transformations, like OHE, on reading from the tables, as those transformations are parameterized by the data you read (the training data subset you select).

Reference: https://www.hopsworks.ai/post/a-taxonomy-for-data-transforma...

jamesblonde · 2026-01-11T19:24:16 1768159456

And the Wallenberg family.

jamesblonde · 2026-01-09T13:49:31 1767966571

Subsidiarity has been a key building block of the EU and has failed the EU for unexpected reasons. Subsidiarity was pursued for accountability and to make the EU less centralized - decisions should be made at the lowest, most local level possible, with central authorities only stepping in when a task cannot be effectively handled locally. However, it means that here in Sweden govt bodies are all individually moving to Azure, because each one makes that local decision in their best interest. The same thing has happened all over the EU - and very few govt bodies would ever take the risk of investing in using EU cloud or data platforms. We need public procurement to help kickstart life into the Eurostack.

jamesblonde · 2025-12-20T12:32:49 1766233969

I will be servers as well. Eurostack cloud providers. We are involved in one of these - a large car company doing the same.

jamesblonde · 2025-12-20T12:29:42 1766233782

They control Europe's digital infrastructure and are able to increase rent to usurous levels (tarrifs!) because Europe is dependent on their digital services. Without digital sovereignty, Europe has no sovereignty and will quickly become a modern colony from which wealth will be extracted.

pembrook · 2025-12-20T12:39:32 1766234372

The reason the US is able to raise rents (tariffs) has nothing to do with Europe buying US digital services.

The tariffs are on European exports. The problem is Europe has a weak domestic consumer market and is dependent on selling stuff to the US, not buying from them.

microtonal · 2025-12-20T13:38:46 1766237926

The EU has a services deficit compared to the US, the US has a goods deficit compared to Europe. Together, they are almost in balance, the difference is just 3% of total trade [1]. Put differently, the US and the EU need each other. This is why Trump is using footguns.

[1] https://policy.trade.ec.europa.eu/eu-trade-relationships-cou...

kyboren · 2025-12-21T17:08:45 1766336925

The problem is really that Europe has a few dozen weak consumer markets. If there really was a proper single market, I suspect the EU would be much more competitive in digital services.

Unfortunately despite their best efforts this isn't something Eurocrats can simply will into existence. The most important prerequisite is a common language, and there is zero political will to do the only sensible thing and establish English as the official common language of the EU.

jamesblonde · 2025-12-20T13:31:32 1766237492

Nonsense. Unilaterial tarrifs are not how trade agreements work. This is pure extractive rent.

The reason the US is not able to extract the same rents from China is that they have digital sovereignty and the US cannot just pull the cloud plug from them.

andsoitis · 2025-12-20T13:43:45 1766238225

> Nonsense. Unilaterial tarrifs are not how trade agreements work. This is pure extractive rent.

What do you mean by "unilateral tariffs"?

> The reason the US is not able to extract the same rents from China is that they have digital sovereignty and the US cannot just pull the cloud plug from them.

The US has higher tariffs against Chinese imports than European imports.

jamesblonde · 2025-12-13T16:44:28 1765644268

I agree that this is an anti-pattern for training. In training, you are often I/O bound over S3 - high b/w networking doesn't fix it (.saftensor files are typically 4GB in size). You need NVMe and high b/w networking along with a distributed file system.

We do this with tiered storage over S3 using HopsFS that has a HDFS API with a FUSE client, so training can just read data (from HopsFS datanode's NVMe cache) as if it is local, but it is pulled from NVMe disks over the network. In contrast, writes go straight to S3 vis HopsFS write-through NVMe cache.