x87 and MMX's register encodings exist in (mostly) separate parts of the x86 operand encoding map. That's in contrast to GPRs, which have to squeeze into 3 (sometimes 4, with the REX prefix) bits.
That's where the incompatibility comes from -- x86-64 required an entirely new prefix to merely double the GPRs; adding a few hundred more would require some very substantial changes to the opcode map and all decoders already out there.
When x87+MMX were added, existing programs ran unchanged. When x86-64 doubled the register sets, many existing programs ran unchanged (some features were dropped). Compatibility was largely maintained. That compatibility was what AMD wagged in Intel's face when Intel was trying to pivot to Itanium. Intel had to then take the walk of shame and adopt AMD's approach.
Seriously, Intel took a long view towards this. x87 was a wart on the side of mole and still its unholy marriage with MMX (they shared a register set) allowed existing programs to run while creating a compatibility barrier to competitors. Competitors had to be compatible and bug compatible. The guy tasked with doing this at Transmeta almost had a nervous breakdown, not from compatibility (easy) but from bug compatibility.
I'm not saying it's impossible! They certainly have plenty of space in the EVEX scheme. But extending the GPRs is a much bigger lift, tooling-wise, than is adding a relatively disjoint ISA extension. Even if they can do it while preserving older encodings, it's just another speedbump at a time when Intel is probably anxious to make x86-64 as frictionless as possible.
Besides, register renaming seems to be working splendidly at the uarch level. Why complicate the architectural model when the gains are already present?
> Even if they can do it while preserving older encodings, it's just another speedbump at a time when Intel is probably anxious to make x86-64 as frictionless as possible.
Just wanted to throw out there that it was AMD that came up with x86-64's ISA rather than Intel. Intel was still pushing Itanium hard at the time.
That's where the incompatibility comes from -- x86-64 required an entirely new prefix to merely double the GPRs; adding a few hundred more would require some very substantial changes to the opcode map and all decoders already out there.