I think the entire point of this post is just to share a war story and help others understand some Redis problems and debugging steps. It isn't some deeper optimization guide or anything. This exact problem is something I have caused and encountered multiple times in my career now. On my first day at Instacart the entire site was down in an extended outage due to a hot key issue exactly like this. I think stories like this are worth sharing with the community even if it seems trivial to yourself.
This is team of 2 engineers that's building + shipping faster than most teams I've seen while having insane number DAUs. Moving fast (and patching problems even faster) is the name of the game here and these guys delivered.