More

adrian_b · 2026-02-17T04:37:34 1771303054

The silicon backplanes are also soldered on (advanced) PCB backplanes, so they are just interposers.

I do not think that using various kinds of silicon bridges warrants a change in name. It should be fine to call any such product that uses multiple chips as a MCM (multi-chip module). Even for consumer CPUs, MCMs have already a long history, e.g. with AMD Bulldozer variants, or with the earlier Pentium and Athlon that had the L2 cache memory on a separate chip.

The word "chiplet" is relatively new and its appearance is justified for naming a chip whose purpose is to be a part of a MCM, instead of being packaged separately. A modern chiplet should be designed specifically for this purpose, because its I/O buffers need very different characteristics for driving the internal interfaces of the MCM, in comparison with driving interfaces on a normal PCB, i.e. they can be smaller and faster.

adrian_b · 2026-02-16T18:05:11 1771265111

There is some kind of reciprocity, e.g. when an atom absorbs light and it passes into a higher energy level, it will spontaneously emit light going back to the initial energy level, with about the same probability with which it has absorbed the light.

However this reciprocity is frequently circumvented, because atoms and ions have a lot of energy levels. Instead of re-emitting the light, the atom may pass more quickly to another energy level, and from there it may emit light with a very different probability (and of a different frequency, i.e. this is fluorescence).

While in fluorescence light with a lower frequency is emitted, there is also the opposite case. In very intense light, e.g. from lasers, multi-photon absorption may happen. In that case there is also no reciprocity, because the atom has jumped an energy difference higher than that of the incoming photons. So it may re-emit light with a higher frequency.

With rubidium atoms, multi-photon absorption is very frequently used, for Doppler-effect-free spectroscopy (by absorbing photons that come from opposite directions, so that the effects of the movement of the atom will cancel). In comparison with other atoms, rubidium vapor cells are easy to procure, for spectroscopy experiments, or for use in frequency or wavelength standards, but they still are rather expensive, especially when enriched in only one of the two rubidium isotopes (e.g. if you want just rubidium 87, instead of natural rubidium, a Rb vapor cell may cost close to $1200).

adrian_b · 2026-02-16T17:20:52 1771262452

32 registers have been used in many CPUs since the mid seventies until today, i.e. for a half of century.

During all this time there has been a consensus that 32 registers are better than less registers.

A few CPUs, e.g. SPARC and Itanium have tried to use more registers than that, and they have been considered unsuccessful.

There have been some inconclusive debates about whether 64 architectural registers might be better than 32 registers, but the fact the 32 are better than 16 has been pretty much undisputed. So there are chances that 32 is close to an optimal number of GPRs.

Using 32 registers is something new only for Intel-AMD, due to their legacy, but it is something that all their competition has used successfully for many decades.

I have written many assembly language programs for ARM, POWER or x86. Whenever I had 32 registers, this made it much easier for me to avoid most memory accesses, for spilling or for other purposes. It is true that a compiler is dumber than a human programmer and it frequently has to adhere to a rigid ABI, but even so, I expect that on average even a compiler will succeed to reduce the register spilling when using 32 registers.

adrian_b · 2026-02-16T17:10:36 1771261836

Yes, the argument for increasing the number of GPRs is precisely to eliminate the register spilling that in necessary in x86-64 programs whenever a program has already used all available architectural registers, a register spilling that does not happen whenever the same program is compiled for Aarch64 or IBM POWER.

adrian_b · 2026-02-16T17:06:55 1771261615

ARM Aarch64, IBM POWER and most RISC CPUs have had 32 general-purpose registers for many decades, and this has always been an advantage for them, not a disadvantage.

So there already is plenty of experience with ABIs using 32 GPRs.

Even so, unnecessary spilling could be avoided, but that would require some changes in programmer habits. For instance, one could require an ordering of the compilation of source files, e.g. to always compile the dependencies of a program module before it. Then the compiler could annotate the module interfaces with register usage and then use this annotations when compiling a program that imports the modules. This would avoid the need to define an ABI that is never optimal in most cases, by either saving too many or too few registers.

Joker_vD · 2026-02-17T04:54:04 1771304044

Yeah, this approach has been proposed, in an even more aggressive form, see David W. Wall, "Global Register Allocation at Link Time", 1986 [0], specifically for machines with large number of registers: the paper straight up says that it's much less of a problem if you a) have less registers, b) don't compile things separately. And this problem with large register files has been predicted long before, see wonderfully named "How to Use 1000 Registers", 1979 by Richard L. Sites (he would later go work on VAX and design Alpha ISA at DEC).

[0] https://dl.acm.org/doi/10.1145/12276.13338

[1] https://files01.core.ac.uk/download/pdf/9412584.pdf

adrian_b · 2026-02-17T14:17:52 1771337872

Thanks for the links.

Register allocation at link time would work only for ISAs where all registers are equivalent or they are partitioned only in a small number of equivalence classes.

It cannot work for an ISA like x86-64, where register allocation and instruction choice are extremely interdependent (e.g. the 16 Intel-AMD general-purpose registers are partitioned into 11 equivalence classes, instead of in at most 2 to 4 equivalence classes, like in the majority of other ISAs, where only a stack pointer and perhaps a return link pointer and/or a null register may have a special behavior). With such an ISA, the compiler must know the exact register allocation before choosing instructions, otherwise the generated program can be much worse than an optimum program. Moreover, an ideal optimizing compiler for x86-64 might need to generate instructions for several alternative register allocations, then select the best allocation and generate the definitive instruction sequence.

With such a non-orthogonal ISA, like x86-64, for the best results you need to allocate registers based on the register usage of the invoked procedures, as assembly programmers typically do, instead of using a uniform sub-optimal ABI, like most compilers.

adrian_b · 2026-02-16T16:53:11 1771260791

In the Intel-AMD CPUs, there are separate register files for renaming the 16 general-purpose registers (which will become 32 registers in Intel Nova Lake and Diamond Rapids, by the end of this year) and for renaming the 16 (AVX) or 32 (AVX-512) vector registers.

Both register files have a few hundred of scalar, respectively vector registers.

Besides these 2 big register files, there are a few other registers for renaming some special registers, e.g. the flags register and the AVX-512 mask registers.

Between the general-purpose registers there are no renaming differences, any of the 16 registers can be mapped to any of the hundreds of hidden registers, regardless if the register name used in the program is RAX, RCX or whatever.

Some differences between apparently similar instructions may be caused not by the fact that they use RAX or another register, but by whether they affect the flags or not, because the number of renaming registers available for flags is much smaller than the hundreds available for GPRs.

adrian_b · 2026-02-16T16:40:16 1771260016

Yes, but you transition between the 2 modes with far jumps, far calls or far returns, which reload the code segment.

Without passing through a far jump/call/return, you cannot alternate between instructions that are valid only in 32-bit mode and instructions that are valid only in 64-bit mode.

Normally you would have 32-bit functions embedded in a 64-bit main program, or vice-versa. Unlike normal functions, which are invoked with near calls and end in near returns, such functions would be invoked with far calls and they would end in far returns.

However, there is no need to write now such hybrid programs. The 32-bit compatibility mode exists mainly for running complete legacy programs, which have been compiled for 32-bit CPUs.

adrian_b · 2026-02-16T16:31:04 1771259464

By "existing OSes", that really means Microsoft Windows, other OSes would not have had any problems with the negligible update required to save and restore more registers.

During many decades, Intel has introduced a lot of awful workarounds in their CPUs for the only reason that Microsoft was too lazy to update their OS so the newer better CPUs had to be managed by the OS exactly in the same way as the old worse CPUs, even if that moved inside the CPUs various functions that can be done much more efficiently by the OS, so their place is not inside the CPU.

So the MMX registers were aliased over the FPU registers because in this way the existing MS Windows saved them automatically at thread switching. Eventually the limitations of MMX were too great, and due to competitive pressure from AMD (3DNow!) and Motorola (AltiVec), Intel and Microsoft were forced to transition to SSE in 1999, for which a couple of new save and restore instructions have been added and used by the OS, allowing an increase in the number and size of registers.

adrian_b · 2026-02-16T12:22:46 1771244566

Qualcomm also had their own architectural license, but Arm claimed that neither of the 2 licenses is applicable.

I have read at that time the documents presented by both parties, but essential details of the contracts were missing from the public documents, so it was impossible to know which of Qualcomm and Arm was right.

Nevertheless, the argumentation presented by Qualcomm seemed far more plausible, unless it would have been contradicted by some of the redacted out contract details.

Arm has not shown in public any information that could have proven that they are right, so they, or their supporters, may not say that the judge has made a wrong decision.

Supposing that the decision was wrong, Arm has preferred to keep secret their arrangements with the customers, instead of proving that they were right, presumably because they might lose more money if other customers learned the exact details of the contract with Qualcomm, so they could request similar terms.

While I believe that it is right for Qualcomm to have won the trial, I strongly dislike the fact that Qualcomm now designs their own cores instead of licensing them from Arm.

The reason is that the Arm cores have excellent documentation, on par with that of the Intel-AMD CPUs, while the Qualcomm cores, like the Apple cores, have non-existent documentation. Moreover, Qualcomm is so silly that they have always obfuscated even what Arm cores they used in their older products. Whenever I evaluated some smartphone with Qualcomm SoC it was impossible to find any useful information on the Qualcomm site, but I had to go to various third parties to learn what is actually inside the Qualcomm SoC, to be able to compare it with alternatives.

adrian_b · 2026-02-16T12:16:46 1771244206

No, Qualcomm was willing to pay the Nuvia license fees, but Arm said that this is too little, because they licensed Nuvia only for products to be sold for servers, which was expected to be a small market, and now Qualcomm wanted to reuse some of the work in laptop CPUs.

So Arm requested increased license fees, which Qualcomm refused, claiming that the license contracts that both Qualcomm and Nuvia had remain valid, with their already negotiated license fees.