If you’re doing it right, the DR process is basically the deployment process, an...

If you’re doing it right, the DR process is basically the deployment process, and gets tested every time you do a deployment. We used chef, docker, stored snapshot images, and every deploy basically spun up a new infrastructure from scratch, and once it had passed the automated tests, the load balancers would switch to the new instance. DBs were created from binary snapshots which would then slave off the live DB to catch up (never more than an hour of diff), which also ensured we had a continuously tested DB backup process. The previous instance would get torn down after 8 hours, which was long enough to allow any straggling processes to finish and to have somewhere to roll back to if needed.

This all got stored in the cloud, but also locally in our office, and also written onto a DVD-R, all automatically, all verified each time.

Our absolute worst case scenario would be less than an hour of downtime, less than an hour of data loss.

Similarly our dev environments were a watered down version of the live environment, and so if they were somehow lost, they could be restored in the same manner - and again, frequently tested, as any merge into the preprod branch would trigger a new dev environment to automatically spin up with that codebase.

It takes up-front engineering effort to get in place, but it ended up saving our bacon twice, and made our entire pipeline much easier and faster to manage.