Not much if you adjust the ISA for it! https://arxiv.org/abs/2007.15919 This pap...

coder543 · on Jan 1, 2023

I've only glanced over that paper, but they're using a processor with a 5-stage pipeline, which is really short by modern standards (Zen 1 uses a 19-stage pipeline, and I couldn't quickly find the number for subsequent versions of Zen). Using a very short pipeline significantly reduces the advantage of control flow speculation (CFS)... but they still showed CFS offering up to a ~50% advantage over their best alternative, if I'm reading that right.

I wish they had included the geometric mean of their benchmarks, but I didn't see it anywhere, and I'm not going to run the numbers right now. Even if the speedup CFS offered on a 5-stage pipeline is "only" 25%, that is still huge... and on a larger pipeline, that delta would grow. "Not much" is drastically different from my interpretation of those results.

I do think security is extremely important, but I'm not convinced that things are currently so terrible that this is the only way forward, as the authors seemed to imply.

OTOH, I would enjoy seeing a return of an Itanium-style ISA that moves a lot of speculation from the hardware to the compiler. I think compilers are in a much better place now than they were when Itanium hit the scene, which did not help Itanium's problems.

cpgxiii · on Jan 1, 2023

In many ways we've seen that return to dumber hardware + smarter compilers in the GPGPU realm, although even there the hardware continues to get more capable over time.

Those applications tend to work, though, on the basis that either the compiler is generating fat binaries to support multiple architecture versions (e.g. Cuda), or some sort of IR, or compilation happens at runtime (e.g. OpenCL). It doesn't really work if you want to generate single binaries that will work performantly on a wide range of hardware versions - particularly important for users answering "how will application X work on future hardware Y", which really gets in the way of general-purpose use.

That's really the great advantage of putting more smarts in the hardware - you can evolve the processor design (often to improve performance) while executing the same binaries.

shiftingleft · on Jan 1, 2023

Thanks for the insights!

In my defense, 1.5X is "not much" when compared with 5 to 100X :-)