Throughout the entire processor, there are a few basic building blocks that will be used in various places. Let's have a look at some of them.
Registers
By far, the block that we'll use the most is the register. Not only in the register file, but also between each stage of the pipeline.
Typically implemented
with D-Flip-Flops, a register has input pins, output pins, a clock
line and an output enable line.
When the clock transitions from one state (let's say low) to the other
(high), the device "saves" the inputs internally in its flip-flops and provides
the same values on its outputs.
The output enable line can be used to put the output in high-impedence
mode, which effectively disconnects the outputs without affecting the internal
operation of the flip-flops inside the device.
Some components also have a clock enable line. When not active, the clock transitions have no effect and the internal flip-flops do not change. This could prove usefull in the register-file for instance, where only one register (at most) needs to be written to during the Writeback stage.
Some other devices have a synchronous or asynchronous clear signal, to set all the flip-flops at zero (low level). Synchronous means that it only happens on the clock transition, asynchronous means that the effect is immediate. Maybe that could be used to "kill" instructions when flushing the pipeline: set the entire decoded instruction to just a bunch of zeros.
Constructing a register is fairly easy: there are components that do exactly that already. They will often be only 8 or 16 bits wide, which is too small for our 32 bits register, but it is only a matter of using one of them for the low half-word (bits 0 to 15), another for the high half-word (bits 16 to 31), and tying together the clock and other control lines.
Adders
We will need at least three 32 bits adders in our processor. Well, the first will only ever compute PC+4 in the Fetch stage, so that one could be implemented as a 30 bits incrementer. We'll see later how an incrementer can be constructed using a simplified version of the adder.
The second one is part of the ALU, in the Execute stage. That one needs
to do substractions and signed arithmetic as well, so might need some poking
at its inputs, depending on the function we want it to perform.
Quick reminder: in order to subtract in two's-complement, we just need to
bitwise invert the second operand and set the carry-in to 1.
The third adder is found in the same stage, but its role will be solely to
compute the next PC in case of a jump or a branch. Branches and the jal
instruction will compute PC + some offset, while jalr
needs an arbitrary
base register + the offset.
All these will be constructed following the "FET-switch adder" idea that emerged from the 6502 forums.
For speed consideration, we may end up implementing a carry-select stage.
I was also pondering about implementing the jal
instruction during the
Decode stage (using a fourth adder), since that one does only require PC
and an immediate offset, and no source register. I then ditched this idea,
though, because it will be incredibly easier to do all our jumps, branches
and interrupts all at the same stage! Trust me, we don't want to take a shortcut
here.
Comparators
The ALU will need what I'd call a full arithmetic comparator, capable of doing magnitude comparisons for both signed and unsigned integers. In the Risc-V ISA, the branch comparisons are either equal, not equal, less than, or greater or equal. There is also the "set if less than" family of instructions that need to do signed and unsigned comparisons.
Many other comparators, as far as I can imagine at this point, will be binary equality comparators, most of them in the Decode stage where they will be matched against parts of the instruction to make the decoded signals.
More equality comparators will be needed in the forwarding logic. These would only be 5-bit, comparing the destination register of the first instruction with one of the source registers of the next instructions to determine if a bypass needs to be activated.
Also, equality comparators will likely be used to determine which memory or IO space is active, based on the address.
Here again, the comparator can be built with FET-switches for the speed. We will also see how smaller comparators can be cascaded to build one that is 32 bit wide.
Decoders/selectors
Selectors (or decoders) take a binary number as their input to set a single one of their outputs to a given state. A typical example for this function is the classic 3-to-8 decoder, the 74LVC138, that has 3 inputs and 8 outputs. In order to be cascaded to build wider selectors, these devices often have one or more enable pins. For instance, the 74LVC138 has three of them, one active high and two active low.
Astorisc will need a few 5-to-32 selectors in the register file, and will also need some yet undetermined selectors in the RAM modules, to access the right chips.