> CPUs are very good about protecting themselves, but I'm shocked no one noticed...

cesarb · on Nov 11, 2024

> Why would they take hours?

Thermal throttling. I have personal experience with that: once I received a laptop which was missing the four screws holding the heatsink. It took over half an hour trying to boot Windows before powering down due to overheating. Since I'm not used to Windows, I thought it being very slow (on the first boot, which sets things up) might be normal, but suddenly powering down certainly wasn't, and the BIOS event log pointed to the culprit.

The issue is that AFAIK it does not reduce the clock rate; it runs at the normal full clock rate, then when it detects the temperature went over the limit, it pauses the CPU for a while to let it cool down. With a missing CPU fan, that might be enough, but it wasn't enough with the heatsink detached.

> That's why nobody noticed: because they didn't take hours neither to boot nor to do something.

It's normal for servers to take a while to boot (before even getting to the operating system).

kadoban · on Nov 11, 2024

> The issue is that AFAIK it does not reduce the clock rate; it runs at the normal full clock rate, then when it detects the temperature went over the limit, it pauses the CPU for a while to let it cool down.

I'm 99% confident that that is not a thing, pausing the cpu. The closest thing that exists is sleep/suspend/hibernate, but those don't work like that (on temperature triggers). Other than those, if the machine is on: the cpu is always doing _something_.

cesarb · on Nov 11, 2024

> Other than those, if the machine is on: the cpu is always doing _something_.

That used to be the case back in the 1990s when running MS-DOS. Nowadays, every operating system "pauses" the CPU when nothing is going on. The traditional way on x86 was to use the HLT instruction, which stops the processor until an interrupt happens; other architectures have their own equivalents (for instance, ARM has the WFI and WFE instructions), and x86 has more modern ways to do the same thing (like the MONITOR/MWAIT instructions). These instructions allow the processor to dynamically enter lower power modes, for instance by blocking the clock and/or power going into parts of the processor core (that is, "gating" the clock or power).

kadoban · on Nov 12, 2024

Isn't that just parts of the cpu and at the OS level though? I don't think that happens for thermal reasons, more for "welp, nothing worth doing" reasons.

Maybe I'm wrong, looking back at what I said I'm having a hard time explaining what my objection is, but I just don't know of a mechanism that's like "welp, I'm too hot, pausing for a few seconds, see you then!". Aren't there buses and caches and etc. that need constant attention whenever the machine is on? The cpu can't just tell everything to wait for a bit.

camtarn · on Nov 11, 2024

Probably depends on the CPU. When my 2014-era Core i5's heatsink fell off (see comment above) I could see the CPU clocks actually reducing.

blitzar · on Nov 11, 2024

Guessing they got a 100x speedup when they moved to the cloud.