Our underlying hardware (AWS) is nothing like this reliable. We see regular (several times a year) failure of racks of machines or whole DCs.
Across the whole fleet (all services), we lose 1-10 servers per day as a baseline. Major events are then on top of that and can impact thousand of hosts at once.
Across the whole fleet (all services), we lose 1-10 servers per day as a baseline. Major events are then on top of that and can impact thousand of hosts at once.