> CPUs are very good about protecting themselves, but I'm shocked no one noticed that they took many hours to boot up or to do anything at all.
Why would they take hours? Something something about energy consumption (and heat) growing to the square of the clock speed. You take a server that can go to, say, 3.2 Ghz and limit it to, say, 0.8 Ghz, it's going to be cool to the touch. And at 25% the speed, something taking ms or seconds, won't suddenly take hours.
That's why nobody noticed: because they didn't take hours neither to boot nor to do something.
Thermal throttling. I have personal experience with that: once I received a laptop which was missing the four screws holding the heatsink. It took over half an hour trying to boot Windows before powering down due to overheating. Since I'm not used to Windows, I thought it being very slow (on the first boot, which sets things up) might be normal, but suddenly powering down certainly wasn't, and the BIOS event log pointed to the culprit.
The issue is that AFAIK it does not reduce the clock rate; it runs at the normal full clock rate, then when it detects the temperature went over the limit, it pauses the CPU for a while to let it cool down. With a missing CPU fan, that might be enough, but it wasn't enough with the heatsink detached.
> That's why nobody noticed: because they didn't take hours neither to boot nor to do something.
It's normal for servers to take a while to boot (before even getting to the operating system).
> The issue is that AFAIK it does not reduce the clock rate; it runs at the normal full clock rate, then when it detects the temperature went over the limit, it pauses the CPU for a while to let it cool down.
I'm 99% confident that that is not a thing, pausing the cpu. The closest thing that exists is sleep/suspend/hibernate, but those don't work like that (on temperature triggers). Other than those, if the machine is on: the cpu is always doing _something_.
> Other than those, if the machine is on: the cpu is always doing _something_.
That used to be the case back in the 1990s when running MS-DOS. Nowadays, every operating system "pauses" the CPU when nothing is going on. The traditional way on x86 was to use the HLT instruction, which stops the processor until an interrupt happens; other architectures have their own equivalents (for instance, ARM has the WFI and WFE instructions), and x86 has more modern ways to do the same thing (like the MONITOR/MWAIT instructions). These instructions allow the processor to dynamically enter lower power modes, for instance by blocking the clock and/or power going into parts of the processor core (that is, "gating" the clock or power).
Isn't that just parts of the cpu and at the OS level though? I don't think that happens for thermal reasons, more for "welp, nothing worth doing" reasons.
Maybe I'm wrong, looking back at what I said I'm having a hard time explaining what my objection is, but I just don't know of a mechanism that's like "welp, I'm too hot, pausing for a few seconds, see you then!". Aren't there buses and caches and etc. that need constant attention whenever the machine is on? The cpu can't just tell everything to wait for a bit.
Why would they take hours? Something something about energy consumption (and heat) growing to the square of the clock speed. You take a server that can go to, say, 3.2 Ghz and limit it to, say, 0.8 Ghz, it's going to be cool to the touch. And at 25% the speed, something taking ms or seconds, won't suddenly take hours.
That's why nobody noticed: because they didn't take hours neither to boot nor to do something.