Great summary, this matches my experience. For straight-line code, modern C compilers can't be beat. But when it comes to register allocation, they constantly make decisions that are real head-scratchers.
One of the biggest problems is when cold paths compromise the efficiency of hot paths. You would hope that __builtin_expect() would help, but from what I can tell __builtin_expect() has no direct impact on register allocation. I wish the compiler would use this information to make sure that cold paths can never compromise the register allocation of the hot paths, but I constantly see register shuffles or spills on hot paths that are only for the benefit of cold paths.
Is there anywhere I can follow your work? I am very interested in keeping track of the state of the art.
Yeah, I did a quick check in LLVM at some point to see what it does (query I relied on: https://github.com/llvm/llvm-project/search?q=getPredictable...) and all the results seemed to be exclusively code motion or deciding how to lower a branch. Similarly cold path outlining seemed to just want to split the function in a fairly simple manner rather than doing anything beyond that. Perhaps I missed something, but I think the current hints are just to help the branch predictor or instruction cache rather than significantly alter codegen.
Unfortunately, I don't have much to share at the moment besides my thoughts; I've done a few small tests but haven't been able to really do a full implementation yet. The primary consumer of this work would be iSH (https://github.com/ish-app/ish), which has a need for a fast interpreter, so you can at least take a look at the current implementation to see what we'd like to replace. The nature of the project means that most of my time has been tied up in things like making sure that keyboard avoidance is set up correctly and that users can customize the background color of their terminal :/
With that said, I'd be happy to chat more if you'd like–feel free to send me an email or whatever. Not sure I can say I'm at the state of the art yet, but perhaps we can get there :)
One of the biggest problems is when cold paths compromise the efficiency of hot paths. You would hope that __builtin_expect() would help, but from what I can tell __builtin_expect() has no direct impact on register allocation. I wish the compiler would use this information to make sure that cold paths can never compromise the register allocation of the hot paths, but I constantly see register shuffles or spills on hot paths that are only for the benefit of cold paths.
Is there anywhere I can follow your work? I am very interested in keeping track of the state of the art.