Adding more logical registers is compatibility breaking. And, since you have to ...

CalChris · on Nov 4, 2021

Adding more logical registers doesn't have to break application compatibility. Intel added x87, MMX ... extensions with their new register sets all without breaking compatibility. They even doubled the integer register set in x86_64 with the REX prefix. New programs could use these features and their register sets without existing programs being broken.

What register renaming allows is to increase the performance of both new and existing programs, which is no mean feat. It allows the CPU scheduler to search for more out-of-order parallelism rather than relying on the compiler to find in-order parallelism.

This binary compatibility doesn't seem very important now, don't break userspace excepted, but it was then. Compatibility made IBM and Intel hundreds of billions of dollars.

monocasa · on Nov 4, 2021

Adding those registers each time at the bare minimum broke OS compatibility in order to enable them, and required kernel changes to save and restore the new architectural registers. Adding to the physical register file allows existing code (including kernel space) to take advantage of it. There's been some extensions lately like xsave that try to address that, but they're not fully embraced by major kernels AFAIK.

CalChris · on Nov 4, 2021

Tomasulo's register renaming algorithm (1967) comes from an era when you bought the computer and the operating system together. So OS compatibility was their problem. That's not our era but the resulting in-order vs out-of-order war is long over.

Register renaming enabled out-of-order execution. The 90s were a competition between in-order compiler based scheduling and out-of-order CPUs. The in-order proponents said that out-of-order was too power hungry, too complex and wouldn't scale. Well, it did. Even the Itanium which was the great in-order hope, its last microarchitecture, Poulson, had out-of-order execution.

Ultimately, out-of-order won the war but in-order survives for low power low complexity designs; the A53 is in-order. Skylake Server has 180 physical registers with an out-of-order search window of 224.

https://www.primeline-solutions.com/media/wysiwyg/news-press...

monocasa · on Nov 4, 2021

In the 1960s, os compat was your problem as the end user too. There wasn't the strict divide between OS code and user code in the same way. The 360/91 that Tomasulo's algorithm originally shipped on didn't even have an MMU to separate your code from the kernel.

Additionally, compiler tech wasn't anywhere near where it was today, and high perf code was written in ASM. Therefore existing code would have to be rewritten to use more registers, but the 360/91 ran existing code just fine (which was very important for the 360/91's main customers).

woodruffw · on Nov 4, 2021

x87 and MMX's register encodings exist in (mostly) separate parts of the x86 operand encoding map. That's in contrast to GPRs, which have to squeeze into 3 (sometimes 4, with the REX prefix) bits.

That's where the incompatibility comes from -- x86-64 required an entirely new prefix to merely double the GPRs; adding a few hundred more would require some very substantial changes to the opcode map and all decoders already out there.

CalChris · on Nov 4, 2021

When x87+MMX were added, existing programs ran unchanged. When x86-64 doubled the register sets, many existing programs ran unchanged (some features were dropped). Compatibility was largely maintained. That compatibility was what AMD wagged in Intel's face when Intel was trying to pivot to Itanium. Intel had to then take the walk of shame and adopt AMD's approach.

Seriously, Intel took a long view towards this. x87 was a wart on the side of mole and still its unholy marriage with MMX (they shared a register set) allowed existing programs to run while creating a compatibility barrier to competitors. Competitors had to be compatible and bug compatible. The guy tasked with doing this at Transmeta almost had a nervous breakdown, not from compatibility (easy) but from bug compatibility.

IBM 360 programs still run on the Z architecture.

chasil · on Nov 4, 2021

The original 8087 prompted the IEE-754 floating point standard, which had a profound impact on the entire field of computer science.

https://news.ycombinator.com/item?id=17767925

https://news.ycombinator.com/item?id=23205225

https://news.ycombinator.com/item?id=23362673

https://news.ycombinator.com/item?id=18107165

woodruffw · on Nov 4, 2021

I'm not saying it's impossible! They certainly have plenty of space in the EVEX scheme. But extending the GPRs is a much bigger lift, tooling-wise, than is adding a relatively disjoint ISA extension. Even if they can do it while preserving older encodings, it's just another speedbump at a time when Intel is probably anxious to make x86-64 as frictionless as possible.

Besides, register renaming seems to be working splendidly at the uarch level. Why complicate the architectural model when the gains are already present?

monocasa · on Nov 4, 2021

> Even if they can do it while preserving older encodings, it's just another speedbump at a time when Intel is probably anxious to make x86-64 as frictionless as possible.

Just wanted to throw out there that it was AMD that came up with x86-64's ISA rather than Intel. Intel was still pushing Itanium hard at the time.

garaetjjte · on Nov 4, 2021

x87 instructions weren't dropped, they are still available on x86_64.

CalChris · on Nov 4, 2021

Yeah, you're right. 64-Bit Mode Valid Thanks.