Crowdstrike pushed a configuration change that was a malformed file, which was picked up by every computer running a the agent (millions of computers across the globe). It's not like hospitals and IT systems are manually running this update and can roll it back.
As to why they didn't catch this during tests or why they don't use perform gradual change rollouts to hosts, your guess is as good as mine. I hope we get a public postmortem for this.
Considering Crowdstrike mentioned in their blog that systems that had their 'falcon sensor' installed weren't affected [1], and the update is falcon content, I'm not sure it was a malformed file, but just software that required this sensor to be installed. Perhaps their QA only checked if the update broke systems with this sensor installed, and didn't do a regression check on windows systems without it.
It says that if a system isn’t “affected”, meaning it doesn’t reboot in a loop, then the “protection” works and nothing needs to be done. That’s because the Crowdstrike central systems, on which rely the agents running on the clients’ systems, are working well.
The “sensor” is what the clients actually install and run on their machines in order to “use Crowdstrike”.
The crash happened in a file named csagent.sys which on my machine was something like a week old.
As to why they didn't catch this during tests or why they don't use perform gradual change rollouts to hosts, your guess is as good as mine. I hope we get a public postmortem for this.