"No FPGAs, no microcontrollers, just discrete logic"
Very hard to believe that. .. How? Is it a SUBLEQ-style Turing tarpit that uses a very high clock and simple hardware to run RISC-V "in software" at a slower clock? Do the discrete logic gates get woven into an ad-hoc FPGA?
Considering the breadboard CPUs I saw a few years ago, it just doesn't look like enough board space... I could be wrong.
A few years ago I considered trying to build a simple 32-bit RISC processor out of 74xx series logic chips. I had just taken parts 1 and 3 of the the MITx MOOC 6.004x, "Computation Structures".
Part 1 covered CMOS, combinatorial logic, sequential logic, FSMs and stuff like that, and performance considerations, and culminated with designing a 32-bit ALU, which did add, sub, all 16 logic ops, compares, and shifts (logical or arithmetic) of 0 to 31 bits in either direction.
In part 2 the students design a 32-bit processor and implement it in the simulator. The design is all at the gate level, except we were given two things as black boxes: (1) a register file of 32 32-bit registers, and (2) a ROM with 64 18-bit words.
Part 3 adds caching and other performance.
Here was the parts list I came up with for my design, not counting whatever it would have to do the 32x32 register file and the 64x18 ROM. In the follows the name of a logic element (NOR, MUX, etc) followed by a number means an element with that number of inputs. E.g, NOR2 is a 2 input NOT gate, and MUX4 is a 4 input multiplexor. DREG is something that can store 1 bit, which would probably be a D flip flop.
That came out to around 350 chips. My biggest breadboard could hold about 32 chips, so I'd need around 11 or 12 of those, plus whatever more would be needed for the register file and ROM.
353 of those 563 MUX2s are in the shifter in the ALU, which can handle left or right arithmetic or logical shift by 0 to 31 in one clock cycle. If I added a new instruction that just does a logic right shift by 1 and made the old shift instructions all trap so they can be handled in software that would get rid of most of those 353 MUX2s.
That would cut the chip count to around 270. That was still more than I was willing to deal with so that was the end of that.
Given that with a PCB instead of a breadboard you'd get higher density, I think mine would have easily fit (even with the full shifter) in the amount of space that it looks like they are using with plenty left over, if I'm estimating the size of their boards right from the photos, so I don't see anything obviously implausible about their project.
> If I added a new instruction that just does a logic right shift by 1 and made the old shift instructions all trap
You can use microcode for that. Also microcode can be used to implement multiply/divide by adding/subtracting in a loop. Microcode is very bad for performance, but in a DIY project it can save lot of chips. Microcode was used in 60s and 70s computers.
And the simplest microcode sequencer is made of 1 (one) register and 1 (one) ROM. Although, for a RISC-V you would probably need more than 1 ROM (microcode is usually very wide, 16- or more bits).
I think now they use it because they can implement the bajillion x86 instructions in terms of much simpler and more efficient micro-ops, without implementing every single x86 instruction completely uniquely.
For counting chips, remember that each 74xx chip typically has multiple gates. For example, one of them has four AND gates, and another has six inverters.
Sometimes, many gates! The 74xx series shows the progress of semiconductor technology over time. Later 74xx series chips are MSI. Oh, that term is quite antiquated now isn't it? Medium-scale integration. (Just about everything is far beyond VLSI these days. Very large scale integration.)
The 7400, the first device in the series, was just four NAND gates in one IC. NAND is the easiest function to implement in TTL (and most other logic families). The somewhat later 7476 is a dual JK flip-flop. It takes eight NAND gates to make a JK flip-flop, so a 7476 is about four times denser than the 7400. And devices like the even later 74161 4-bit synchronous binary counter have many dozens of gates [1], being roughly four times as much integration again over the 7476.
The 7400, 7476, and 74161 were at the cutting edge of integration in 1963, 1966, 1969, respectively.
I built a simple 16-bit RISC processor with a custom instruction set during my engineering coursework...inside a Xilinx Artix-7 FPGA. It was still a lot of work.
> Given that with a PCB instead of a breadboard you'd get higher density...
I can see the appeal of wanting to do this with individual gates on a PCB instead of using Verilog (you can draw bare gates and let Vivado synthesize that too), but you'd have to be crazy to want to do this with breadboards! I can only imagine the frustration of dealing with signal integrity across the ratsnest of individual wires. Which connection looks good but has a loose spring terminal? Yikes! Soldering on perfboard or old-school wire wrap would seem much more reliable, and PCBs are so cheap now that I'd go in that direction for sure.
There's a logisim simulation of the uarch. Assuming that matches the boards, it looks like a relatively straightforward pipeline-less implementation of a RISC.
And it'd be much larger on a breadboard. Breadboards don't really allow your wiring and connections to be nearly as dense as even two layer boards. This project seems doable to me with relatively high scale integration chips handling a lot of heavy lifting, and maybe an EEPROM or two as a poor man's PLD for certain kinds of combinatory logic.
> and maybe an EEPROM or two as a poor man's PLD for certain kinds of combinatory logic.
The entire ALU and Control Unit of the CPU are programmed in EEPROMs. The ALU uses 7 ROMs [1], the Control Unit uses 3 ROMs [2], the program counter uses 5 ROMs [3], the bit shifter uses another ROM [4], so I see 16 EEPROMs in total. Thus, hundreds if not thousands of logic gates are replaced by bitstreams in EEPROMs. This CPU certainly is based on the design philosophy of "EEPROMs as poor-man's FPGA" (not meant to be interpreted in a discouraging way, I found this design is still rather interesting).
Yeah that is cool, because I hear that the tooling for FPGAs is still very proprietary. And it doesn't do much good if one's CPU is "Click the CPU button on the FPGA GUI"
EEPROMS, I assume, you can build your own programmer pretty easily?
> EEPROMS, I assume, you can build your own programmer pretty easily?
Programming a EEPROM by bitbanging it from a microcontroller is a basic programming exercise. First, send a special unlock sequence according to the datasheet (usually with a lot of 0x55 0xAA). Next, put all the address bits on the address lines, strobe that in via a control line. Finally, put all the data bits on the data lines, strobe that in via a control line. You only need around 100 lines of code to create a basic programmer (although making a universal one capable of programming all existing models on the market would be non-trivial, as it requires a massive look-up table similar to the one in "flashrom").
I recently contemplated using a CPLD for a project and discovered that (at least the ones I was able to find suppliers and documentation for) are even less open source friendly than FPGAs are, none of the CPLDs I found seemed to have any open source tooling that I could find.
I would love to be corrected on this because other than proprietary tooling that only supports windows … a CPLD seemed perfect for this project.
From the picture, the logic chips are all in SOIC packages. The use of surface-mount components with 4-layer PCB should already significantly boost routing density compared to a breadboard with DIP chips. All the chips can be tightly packed together.
Furthermore, both the ALU and the Control Unit are entirely in EEPROMs. The ALU uses 7 ROMs [2], the Control Unit uses 3 ROMs [3], the program counter uses 5 ROMs [4], the bit shifter uses another ROM [5], so I already see 16 EEPROMs in total. This means all the discrete components needed for random logic are largely eliminated, consolidating possibly hundreds (or thousands?) of gates into just a few chips and some lookup tables to program. In fact anther maker already demonstrated that it's sufficient to design a functional CPU entirely using RAM and ROM with just 15 chips in total. [6]
Programmmers usually think ROMs as data storage devices, but they are also the most rudimentary form of programmable logic, as they transform x-bit of address inputs into arbitrary y-bit data outputs, so they can implement arbitrary combinational logic. In fact, lookup tables are the heart of modern FPGAs. As a result, you may argue that this means any ROM-based design has ad-hoc FPGAs (especially when EEPROMs are so large after the 1980s, 64 K for 16-bit chips). But the use of Mask ROMs and PLAs in Control Units has always been a legitimate and standard way to design CPUs even back in the 70s, so I won't call it "cheating" (and using ROMs for ALUs or Control Unit wouldn't really be much different from using a pre-made 74181 or AMD Am2900 anyway).
Thanks for pointing this out.
In the olden days it used to be fairly common to use eeproms (or just proms) with a few latches as a state machine. This way it was possible to accomplish many of the things you'd need a CPU for, but without needing one at all.
It's not too far to add an ALU and some control flow after that, hehe.
The use of microcode in CPUs began like this too. Originally, people replaced random logic in a CPU's Control Unit with a Mask ROM or a PLA. The goal was simply to avoid the trouble of building decoders from gates one by one. Although it's a form of programmable logic, few would describe these simple ROM lookup tables as "programs". Later, people pushed the idea further to build a small state machine to control the ALUs, and the microcode in ROM would then be used in turn to control that state machine. This marked the birth of "micro-sequencers". In addition to the internal use in integrated circuits, they also found applications as universal blocking blocks for custom discrete CPUs in the 1980s, examples included the AMD Am2900, Am29100 and Am29300 series chips. Building a CPU became as easy as connecting these logic blocks together and writing some microcode for the micro-sequencer. Continue the development of microcode further, and you eventually get the final form, which is a CPU within a CPU, and you may argue that the CPU is not really hardware anymore, but is driven by software.
At which point does the use of microcode makes a CPU software rather than hardware? There's really no clear boundary, so the microcode always exists on the blurring line between hardware and software. This issue was also a heavily-contented subject in several lawsuits that involved cloning CPUs, as hardware is not covered by copyright, but software is.
>Do the discrete logic gates get woven into an ad-hoc FPGA?
Like literally? Wouldn't that require even more chips?
Or are you saying that the act of combining logic chips itself constitutes a 'buggy, poorly specified [FPGA]'? In which case aren't you erasing the distinction between an FPGA and the alternative.
There is no way it implements the entire RV32I base profile. Maybe it only implements the instructions that are relevant for CS students taking a computer architecture class. That is typically what these breadboard CPUs implement.
RV32I is pretty tiny. Particularly when you can squint hard enough at the spec. FENCE can be a nop. EBREAK and ECALL can be something like a halt of the machine.
Arguably, RV32I is "only the instructions that are relevant for CS students taking a computer architecture class", judging by the history in its earliest published spec [0].
Very hard to believe that. .. How? Is it a SUBLEQ-style Turing tarpit that uses a very high clock and simple hardware to run RISC-V "in software" at a slower clock? Do the discrete logic gates get woven into an ad-hoc FPGA?
Considering the breadboard CPUs I saw a few years ago, it just doesn't look like enough board space... I could be wrong.