Adding more logical registers is compatibility breaking. And, since you have to encode the register specifier in the instruction it means larger instructions (hence the compatibility breaking) which makes reading and decoding instructions slower.
And, regardless of how many logical registers you have you need to have more physical registers. These are needed for out-of-order (OOO) execution and speculative execution. An OOO super-scalar speculative CPU can have hundreds of instructions in flight and these instructions all need physical registers to work on. If you don't have excess physical registers you can't do OOO or speculative execution.
Adding more logical registers doesn't have to break application compatibility. Intel added x87, MMX ... extensions with their new register sets all without breaking compatibility. They even doubled the integer register set in x86_64 with the REX prefix. New programs could use these features and their register sets without existing programs being broken.
What register renaming allows is to increase the performance of both new and existing programs, which is no mean feat. It allows the CPU scheduler to search for more out-of-order parallelism rather than relying on the compiler to find in-order parallelism.
This binary compatibility doesn't seem very important now, don't break userspace excepted, but it was then. Compatibility made IBM and Intel hundreds of billions of dollars.
Adding those registers each time at the bare minimum broke OS compatibility in order to enable them, and required kernel changes to save and restore the new architectural registers. Adding to the physical register file allows existing code (including kernel space) to take advantage of it. There's been some extensions lately like xsave that try to address that, but they're not fully embraced by major kernels AFAIK.
Tomasulo's register renaming algorithm (1967) comes from an era when you bought the computer and the operating system together. So OS compatibility was their problem. That's not our era but the resulting in-order vs out-of-order war is long over.
Register renaming enabled out-of-order execution. The 90s were a competition between in-order compiler based scheduling and out-of-order CPUs. The in-order proponents said that out-of-order was too power hungry, too complex and wouldn't scale. Well, it did. Even the Itanium which was the great in-order hope, its last microarchitecture, Poulson, had out-of-order execution.
Ultimately, out-of-order won the war but in-order survives for low power low complexity designs; the A53 is in-order. Skylake Server has 180 physical registers with an out-of-order search window of 224.
In the 1960s, os compat was your problem as the end user too. There wasn't the strict divide between OS code and user code in the same way. The 360/91 that Tomasulo's algorithm originally shipped on didn't even have an MMU to separate your code from the kernel.
Additionally, compiler tech wasn't anywhere near where it was today, and high perf code was written in ASM. Therefore existing code would have to be rewritten to use more registers, but the 360/91 ran existing code just fine (which was very important for the 360/91's main customers).
x87 and MMX's register encodings exist in (mostly) separate parts of the x86 operand encoding map. That's in contrast to GPRs, which have to squeeze into 3 (sometimes 4, with the REX prefix) bits.
That's where the incompatibility comes from -- x86-64 required an entirely new prefix to merely double the GPRs; adding a few hundred more would require some very substantial changes to the opcode map and all decoders already out there.
When x87+MMX were added, existing programs ran unchanged. When x86-64 doubled the register sets, many existing programs ran unchanged (some features were dropped). Compatibility was largely maintained. That compatibility was what AMD wagged in Intel's face when Intel was trying to pivot to Itanium. Intel had to then take the walk of shame and adopt AMD's approach.
Seriously, Intel took a long view towards this. x87 was a wart on the side of mole and still its unholy marriage with MMX (they shared a register set) allowed existing programs to run while creating a compatibility barrier to competitors. Competitors had to be compatible and bug compatible. The guy tasked with doing this at Transmeta almost had a nervous breakdown, not from compatibility (easy) but from bug compatibility.
I'm not saying it's impossible! They certainly have plenty of space in the EVEX scheme. But extending the GPRs is a much bigger lift, tooling-wise, than is adding a relatively disjoint ISA extension. Even if they can do it while preserving older encodings, it's just another speedbump at a time when Intel is probably anxious to make x86-64 as frictionless as possible.
Besides, register renaming seems to be working splendidly at the uarch level. Why complicate the architectural model when the gains are already present?
> Even if they can do it while preserving older encodings, it's just another speedbump at a time when Intel is probably anxious to make x86-64 as frictionless as possible.
Just wanted to throw out there that it was AMD that came up with x86-64's ISA rather than Intel. Intel was still pushing Itanium hard at the time.
And, regardless of how many logical registers you have you need to have more physical registers. These are needed for out-of-order (OOO) execution and speculative execution. An OOO super-scalar speculative CPU can have hundreds of instructions in flight and these instructions all need physical registers to work on. If you don't have excess physical registers you can't do OOO or speculative execution.