More

ansk · 2026-02-11T21:00:05 1770843605

The guy writing a thumbnail pipeline isn't getting petabytes (exabytes?) of storage to cache all videos from the past week in their entirety. If this quantity of data is being stored, it's being stored deliberately and at significant cost.

ansk · 2026-02-11T20:27:59 1770841679

The other explanations here don't explain the long delay between the start of the investigation and the release of the footage. Yes, storing customer data is what we'd expect from Google and yes, the FBI can coerce Google to provide this data for their investigations. But it does not take a week for Google to find a file on their servers.

My hunch is that Google initially tried to play dumb to avoid compliance, as to not reveal they do in fact retain customer data. They had a plausible excuse as well -- the owner had no subscription so they don't store the data -- and took a gamble that this explanation would suffice until the situation resolved itself. I suspect that authorities initially took Google's excuse at face value, since they parroted this explanation to the public as well. As pressure mounted on authorities to make some headway on the case, they likely formally exercised whatever legal mechanisms they have at their disposal to force Google's hand, and only then was the footage released.

reverius42 · 2026-02-12T08:25:38 1770884738

This is a wild claim. I would think criminal charges for something like obstruction would be possible if Google intentionally hid this from investigators for up to a week. That could result in the difference between the victim being found alive or not.

ansk · 2026-01-24T00:02:59 1769212979

The implication that OpenAI is a YC company in the same sense as the other listed companies is somewhere between misleading and dishonest. Even more distasteful to show founding teams for all the others, then just Sam for OpenAI.

ansk · 2025-10-18T18:18:22 1760811502

Of all Schmidhuber's credit-attribution grievances, this is the one I am most sympathetic to. I think if he spent less time remarking on how other people didn't actually invent things (e.g. Hinton and backprop, LeCun and CNNs, etc.) or making tenuous arguments about how modern techniques are really just instances of some idea he briefly explored decades ago (GANs, attention), and instead just focused on how this single line of research (namely, gradient flow and training dynamics in deep neural networks) laid the foundation for modern deep learning, he'd have a much better reputation and probably a Turing award. That said, I do respect the extent to which he continues his credit-attribution crusade even to his own reputational detriment.

godelski · 2025-10-18T20:46:26 1760820386

I think one of the best things to learn from Schmidhuber is that progress involves a lot of players and over a lot of time. Attribution is actually a difficult game and usually we are only assigning credit to those at the end of some milestone. It's like giving a gold medal to the runner in the last leg of a relay race or focusing only on the lead singer of a band. It's never one person that does it alone. Shoulders of giants, but those giants are just a couple of dudes in a really big trenchcoat.

Another important lesson is that often good ideas get passed over because of hype or politics. We often like to pretend that science is all about the merit and what is correct. Unfortunately this isn't true. It is that way in the long run, but in the short run there's a lot of politics and humans still get in their own way. This is a solvable problem, but we need to acknowledge it and create systematic changes. Unfortunately a lot of that is coupled to the aforementioned one.

  > I do respect the extent to which he continues his credit-attribution crusade even to his own reputational detriment.

As should we all. Clearly he was upset that others got credit for his contributions. But what I do appreciate is that he has recognized that it is a problem bigger than him, and is trying to combat the problem at large and not just his own little battlefield. That's respectable.

dchftcs · 2025-10-18T21:07:04 1760821624

It's a bit of an aside but I believe this is one reason Zuckerberg's vision for establishing the superintelligence lab is misguided. Including VCs, too many people get distracted by rock stars in this gold rush.

godelski · 2025-10-18T21:50:15 1760824215

Just last week I said something inline with that[0]. Many people conflated my claim that Meta has a lot of good people with "Meta /is/ winning the AI race". I just claimed they had some of who I think are some of the best researchers in the field, but do not give them nearly the same resources or capacity to further their research that they give to these "rock stars". Tbh, the same is true for any top lab, I just think this happens more at Meta because Meta is so metric and rock star focused.

So I agree. The vision is misguided. I think they'd have done better had they taken that same money and just thrown it at the people they already have but who are working in different research areas. Everyone is trying to win my doing the same things. That's not a smart strategy. You got all that money, you gotta take risks. It's all the money dumped into research that got us to this point in the first place.

It's good to shift funds around and focus on what is working now, but you also have to have a pipeline of people working on what will work tomorrow, next year, 5 years, and 10 years. The people are there that can do that work. The people are there that want to do the work. The only thing is there's little to no people that want to fund that work. Unfortunately it takes time to bake a cake.

Quite frankly, these companies also have more than enough money to do both. They have enough money to throw cash hand over fist at every wild and crazy idea. But they get caught in the hype, which is no different than an over focus on the attribution rather than the process or pipeline that got us the science in the first place.

[0] https://news.ycombinator.com/item?id=45554147

esafak · 2025-10-19T02:46:17 1760841977

Also, it reminds us that the powerful write history. But history can be rewritten as the balance of power shifts. I imagine the world will hear all about China's contributions to the field if they continue their ascent.

snthpy · 2025-10-19T06:47:32 1760856452

> That said, I do respect the extent to which he continues his credit-attribution crusade even to his own reputational detriment.

Lol, I still used to notice him before covid when he was railing against Bengio, Hinton, and LeCun. Can't believe he's still going.

ansk · 2025-10-17T01:47:35 1760665655

I can only imagine what the Taiwanese can do in Arizona. Truly a synergy for the ages.

foobarian · 2025-10-17T01:50:10 1760665810

Maybe that's why yields there are better? [1]

[1] https://www.tomshardware.com/tech-industry/semiconductors/ts...

dwd · 2025-10-17T02:50:40 1760669440

Once you're in a air-conditioned environment the outside world doesn't matter.

More likely he compared the 4nm yield to the 3nm yield in Taiwan?

eru · 2025-10-17T06:08:45 1760681325

The moisture of the outside world might not matter. But aircon doesn't protect you from earthquakes, alas.

Gracana · 2025-10-17T11:45:28 1760701528

Yep, you need to install a ground conditioner for that.

high_na_euv · 2025-10-17T12:36:51 1760704611

So, how many earthquakes there were?

eru · 2025-10-19T23:55:12 1760918112

https://en.wikipedia.org/wiki/2025_Tainan%E2%80%93Chiayi_ear... has a recent example. Google or your favourite LLM can easily give you more or even a complete list of earthquakes in Taiwan.

high_na_euv · 2025-10-22T13:27:19 1761139639

I meant this part of USA

dluan · 2025-10-17T05:17:01 1760678221

China Airlines recently opened a new direct flight route between Taoyuan and Phoenix. They've been plastering it all over their plane signage. I thought it was funny that the flight must be pretty empty other than the handful of TSMC employees that need to go there.

khuey · 2025-10-17T06:28:35 1760682515

Apparently China Airlines and Starlux are both going to fly that route next year. I have a hard time imagining there's demand for one let alone both.

sam_goody · 2025-10-17T07:26:45 1760686005

Phoenix is the fifth largest city in the United States. It is also one of the major hubs of the west, being in a good location (midway north), having good weather for planes, and having America West headquartered there.

I would think there should be plenty of traffic going through there to Taiwan, similar to the amount going through a hub such as Chicago or NY.

dangus · 2025-10-17T11:01:19 1760698879

From Wikipedia:

PHX was the 11th-busiest airport in the United States in terms of passenger boardings and 35rd-busiest in the world in 2024. The airport serves as a hub for American Airlines and a base for Frontier Airlines and Southwest Airlines.

dluan · 2025-10-17T11:47:53 1760701673

Earlier this year Eva Air also announced a direct route to Dallas, supposedly starting next month. At the time I felt like it was a tariff negotiation tactic because that one also does not make sense.

iszomer · 2025-10-17T11:36:02 1760700962

FYI to all: China Airlines is a Taiwanese company (ROC) and has no affiliation with mainland China (PRC).

_carbyau_ · 2025-10-17T03:11:27 1760670687

Then why would nations around the world protect Taiwan?

coliveira · 2025-10-17T02:45:39 1760669139

Why do you think the Chinese people from Taiwan want to do anything in Arizona? They're there just to placate the orange guy's rage. They'll never do anything special there.

ansk · 2025-09-17T19:35:07 1758137707

My personal experience is that the cost of enduring a negative stimulus is not simply a function of the magnitude of the negative stimulus, but rather the magnitude of the negative stimulus in relation to the magnitude of all other concurrent negative stimuli. This study controls the environment so that a single negative stimulus is isolated and additional external negative stimuli are minimized, but it cannot control for the fact that a depressed person also endures a constant barrage of negative stimuli which are generated internally (hopelessness, exhaustion, fear, self-doubt, etc). The magnitude of these internally generated negative stimuli is likely much larger than that of the aversive external stimulus used in this study, so it seems reasonable that the marginal relief obtained by avoiding the external stimulus may be perceived as relatively negligible, or at least diminished to the point that the cost of avoiding is greater than the cost of enduring.

ansk · 2025-08-18T16:20:04 1755534004

For future reference, if you want proper python bindings for ffmpeg* you should use pyav.

* To be more precise, these are bindings for the libav* libraries that underlie ffmpeg

ansk · 2025-04-22T18:30:43 1745346643

And on the seventh day, God ended His work which He had done and began vibe coding the remainder of the human genome.

sdenton4 · 2025-04-22T18:51:55 1745347915

this should do the trick...

  while creatures:
    c = get_random_creature()
    if c.is_dead():
      creatures.pop(c)
    else:
      creatures.add(c.mutate())

RogerL · 2025-04-22T19:05:46 1745348746

You also need selection, not just mutation (I know you are being silly, so am I)

sdenton4 · 2025-04-22T21:26:39 1745357199

Selection is handled by asynchronous events which populate the is_dead() boolean.

Critiquing my own code, though, it should really be a check against 'can_reproduce()' rather than 'is_dead()'.

ansk · on Oct 17, 2024

The scientific impact of the transformer paper is large, but in my opinion the novelty is vastly overstated. The primary novelty is adapting the (already existing) dot-product attention mechanism to be multi-headed. And frankly, the single-head -> multi-head evolution wasn't particularly novel -- it's the same trick the computer vision community applied to convolutions 5 years earlier, yielding the widely-adopted grouped convolution. The lasting contribution of the Transformer paper is really just ordering the existing architectural primitives (attention layers, feedforward layers, normalization, residuals) in a nice, reusable block. In my opinion, the most impactful contributions in the lineage of modern attention-based LLMs are the introduction of dot-product attention (Bahdanau et al, 2015) and the first attention-based sequence-to-sequence model (Graves, 2013). Both of these are from academic labs.

As a side note, a similar phenomenon occurred with the Adam optimizer, where the ratio of public/scientific attribution to novelty is disproportionately large (the Adam optimizer is very minor modification of the RMSProp + momentum optimization algorithm presented in the same Graves, 2013 paper mentioned above)

HarHarVeryFunny · on Oct 17, 2024

I think the most novel part of it, and where a lot of the power comes from, is in the key based attention, which then operationally gives rise to the emergence of induction heads (whereby pair of adjacent layers coordinate to provide a powerful context lookup and copy mechanism).

The reusable/stackable block is of course a key part of the design since the key insight was that language is as much hierarchical as sequential, and can therefore be processed in parallel (not in sequence) with a hierarchical stack of layers that each use the key-based lookup mechanism to access other tokens whether based on position or not.

In any case, if you look at the seq2seq architectures than preceded it, it's hard to claim that the Transformer is really based-on/evolved-from any of them (especially prevailing recurrent approaches), notwithstanding that it obviously leveraged the concept of attention.

I find the developmental history of the Transformer interesting, and wish more had been documented about it. It seems from interview with Uszkoreit that the idea of parallel language processing based on an hierarchical design using self-attention was his, but that he was personally unable to realize this idea in a way that beat other contemporary approaches. Noam Shazeer was the one who then took the idea and realized it in the the form that would eventually become the Transformer, but it seems there was some degree of throw the kitchen sink at it and then a later ablation process to minimize the design. What would be interesting to know would be an honest assessment of how much of the final design was inspiration and how much experimentation. It's hard to imagine that Shazeer anticipated the emergence of induction heads when this model was trained at sufficient scale, so the architecture does seem to at least partly be an a accidental discovery, and more than the next generation seq2seq model that it seems to have been conceived as.

ansk · on Oct 17, 2024

Key-based attention is not attributable to the Transformer paper. First paper I can find where keys, queries, and values are distinct matrices is https://arxiv.org/abs/1703.03906, described at the end of section 2. The authors of the Transformer paper are very clear in how they describe their contribution to the attention formulation, writing "Dot-product attention is identical to our algorithm, except for the scaling factor". I think it's fair to state that multi-head is the paper's only substantial contribution to the design of attention mechanisms.

I think you're overestimating the degree to which this type of research is motivated by big-picture, top-down thinking. In reality, it's a bunch of empirically-driven, in-the-weeds experiments that guide a very local search in a intractably large search space. I can just about guarantee the process went something like this:

- The authors begin with an architecture similar to the current SOTA, which was a mix of recurrent layers and attention

- The authors realize that they can replace some of the recurrent layers with attention layers, and performance is equal or better. It's also way faster, so they try to replace as many recurrent layers as possible.

- They realize that if they remove all the recurrent layers, the model sucks. They're smart people and they quickly realize this is because the attention-only model is invariant to sequence order. They add positional encodings to compensate for this.

- They keep iterating on the architecture design, incorporating best-practices from the computer vision community such as normalization and residual connections, resulting in the now-famous Transformer block.

At no point is any stroke of genius required to get from the prior SOTA to the Transformer. It's the type of discovery that follows so naturally from an empirically-driven approach to research that it feels all but inevitable.

ansk · on Oct 1, 2024

I've seen and ignored a lot of "pytorch good, tensorflow bad" takes in my time, but this is so egregiously wrong I can't help but chime in. Facilitating graph-level optimizations has been one of the most central tenets of tensorflow's design philosophy since its inception. The XLA compiler was designed in close collaboration with the tensorflow team and was available in the tensorflow API as far back as 2017. It's not an exaggeration to say that pytorch is 5+ years behind on this front. Before anyone invokes the words "pythonic" or "ergonomic", I'd like to note that the tensorflow 2 API for compilation is nearly identical to torch.compile.

brrrrrm · on Oct 1, 2024

it's not about the API. its about the documentation + ecosystem.

TF's doesn't seem very good. I just tried to figure out how to learn a linear mapping with TF and went through this:

1. googled "linear layer in tensorflow" and got to the page about linear.

2. spent 5 minutes trying to understand why monotonicity would be a central tenet of the documentation

3. realizing that's not the right "linear" I couldn't think of what the appropriate name would be

4. I know MLPs have them, google "tensorflow mlp example"

5. click the apr '24 page: https://www.tensorflow.org/guide/core/mlp_core

6. read through 10[!] code blocks that are basically just boiler-plate setup of data and visualizations. entirely unrelated to MLPs

7. realize they call it "dense" in tensorflow world

8. see that "dense" needs to be implemented manually

9. think that's strange, google "tensorflow dense layer"

10. find a keras API (https://www.tensorflow.org/api_docs/python/tf/keras/layers/D...)

mochomocha · on Oct 1, 2024

11. notice that there's a unicode rendering error ("'" for apostrophe) on kernel_initializer and bias_initializer default arguments in the documentation, and wonder why on earth for such a high-level API one would want to expose lora_rank as a first class construct. Also, 3 out of the 5 links in the "Used in the guide" links point to TF1 to TF2 migration articles - TF2 was released 5 years ago.

n_u · on Oct 1, 2024

To add onto this I feel like one of the hard things about TF is that there is like at least 3 ways to do everything because they have supported multiple APIs and migrated to eager. So if you find an example or an open source project it might not be for the flavor of tensorflow that your codebase is in.

__rito__ · on Oct 1, 2024

Moreover, the way you find might not be the best or the most efficient way.

exe34 · on Oct 1, 2024

I feel like that with every single Google api doc. if there's a variable called x, the documentation will be "variable to store x". and you need to create/supply 5 different resources before you can create an x, but these will each require 5 further things to be figured out before you can create one of them.

pjmlp · on Oct 1, 2024

One of the reasons I am happy no longer to do Android, Github samples as "documentation".

__rito__ · on Oct 1, 2024

Re 6: TF/Keras team motivates random people to write long tutorials and be featured in the official site and their tutorial be included in the official guides. I have seen a lot of subpar devs/AI people write subpar tutorials and brag on twitter how their tutorials are included in the official Keras site.

I have seen some good ones, too, of course.

shmel · on Oct 1, 2024

Oh god, you just gave me a flashback =) The last time I properly used TF was in early 2019, I am so happy that I don't have to deal with this anymore.

mft_ · on Oct 1, 2024

Honestly, this example holds true for roughly half of the Python ecosystem; and you can square the level of frustration if it's also anything coming from Google.

(This pattern is relatively easy to understand: smart people creating something get their gratification from the creation process, not writing tedious documentation; and this is systemically embedded for people at Google, who are probably directly incentivised in a similar way.)

sroussey · on Oct 1, 2024

1. Ask ChatGPT

https://chatgpt.com/share/66fc325a-99e8-800d-925c-4924837b1e...

marcinzm · on Oct 1, 2024

Tensorflow works really well in theory. In practice a lot less so. I saw someone spend months fighting Tensorflow to convert a production model from CPU to GPU inference with any sort of efficiency. Tons of issues due to bugs across versions, deprecations of features across versions, the graph optimizer shuffling data back to the CPU for no decent reason, etc. The person had no idea what was happening or why most of the time due to how black box Tensorflow was. This was a very senior ML engineer with a lot of Tensorflow experience.

lgessler · on Oct 1, 2024

GP wrote "simple to use API". You can attribute many qualities to TensorFlow, but this is not one of them.

dekhn · on Oct 1, 2024

Does tensorflow have a future? I doubt it. I don't think Google is really investing many resources into it (beyond the necessary maintainence to support whatever production models still depend on it). The cost of migrating from old TF to new TF was really large, half the projects that depend on TF that I try to use just break out of the box (only 1/4 of torch projects I try fail that way).

From what I can tell Google is moving in a direction that doesn't require tensorflow, and I don't see it gaining signficant adoption outside google, so it seems most likely we will simply see it deprecated in about 10 years. It's best to see it as a transitional technology that Jeff Dean created to spur ML development internally, which was mistakenly open sourced, and now, Jeff's reports typically use Jax or other systems.

catgary · on Oct 1, 2024

I think tensorflow-datasets and tensorflow-serving are great, but for model development I think most people use JAX and then export it to a tensorflow SavedModel with Orbax.

ithkuil · on Oct 1, 2024

But IIUC Jax also leverages XLA and for the purpose of this discussion the frontend matters only inasmuch people feel productive in using it. Whether that's TF or Jax.

zozbot234 · on Oct 1, 2024

> Facilitating graph-level optimizations has been one of the most central tenets of tensorflow's design philosophy since its inception.

Agreed of course but it's not like they came up with this approach from scratch. They seem to have just picked it up from Theano (now Aesara/PyTensor).

YetAnotherNick · on Oct 1, 2024

+1. As someone who has tried to migrate multiple tf.function to torch.compile, tensorflow edge is not small in this. torch.compile still is highly highly experimental. Don't believe me, just go and look into github issues as torch maintainers try to figure why torch.compile makes code very unoptimal in lot of cases, or results in incomprehensible errors.

whymauri · on Oct 1, 2024

I'm so sorry but Tensorflow is simply one of the worst parts of my job.

uoaei · on Oct 1, 2024

Praising XLA by defending Tensorflow of all things has to be one of the strangest takes I've ever come across.

JAX is right there. No need to beat a dead horse when there's a stallion in the stables.

ansk · on Oct 1, 2024

Tensorflow is a lot like IBM -- it deserves praise not because it's great in its current state, but for its contributions towards advancing the broader technological front to where it is today. Tensorflow walked so JAX could run, so to speak. Frankly, I don't really draw much of a distinction between the two frameworks since I really just use them as lightweight XLA wrappers.

uoaei · on Oct 1, 2024

Tensorflow started out as anything but lightweight. In my opinion it takes the cake for kludgiest framework I've ever worked with. So verbose, so little effort put into ergonomics. Even eager mode is not really valuable unless you're working on a legacy project.