> In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.
They can repurpose those nodes for training when they aren't being used for inference. Or if they're using public cloud nodes, just turn them off.
They can repurpose those nodes for training when they aren't being used for inference. Or if they're using public cloud nodes, just turn them off.