These last few weeks, I've been working on implementing the CPU in Digital.
My initial intention was to stay as close as possible to what will be the actual implementation, but in practice, it may not have been a good idea, because it didn't always play well with the simulation. Most of the time, when trying to start the simulation, Digital would complain that more than one output was active on a wire, causing a short circuit in the shifter unit.
The problem went away when I switched from pairs of buffer driven by two complementary signals to plain old multiplexers. For the physical implementation of the processor, selection between two possible values will likely be implemented using the following logic:
When the Sel input is low, the buffer for the A input is active, while the buffer for B is kept in high impedance (meaning its outputs are like disconnected). When Sel is brought high, it's the opposite, and the output is equal to the value on the B input.
But for the moment, the simulation will use multiplexers, like this:
When Sel is low (0), the Output mimics A. When it's high, the Output mimics B. Technically, this is equivalent and will work just like the one above.
Running the first program
Here's how our processor looks like right now (click to enlarge).
The Fetch, Instruction Decode, Execute, Memory, and Writeback stages appear clearly separated with the pipeline registers in cyan between them.
The Decode logic outputs a whole bunch of simple signals that will tell the next stages what action they must take.
On the top will be our control logic, currently very primitive. On the bottom will come the RAM and I/O access, for now only represented by a single RAM, not even ready to handle byte or half-word access.
As you can see, there's no bypasses and forwarding unit yet, and no stall/flush control logic. So how can we make our first program work? Simply by inserting nop instructions in our program where the processor would have needed to wait for a value to be available, or where it would have needed to flush the pipeline (to discard the instructions right after a jmp or a branch). By definition, nops do nothing, and there is no harm to execute them even if they come from the wrong wode path.
This allowed me to run the following program that computes the Fibonacci series until the result gets above 1000:
_start:
li a1, 1 ; start with first number = 1
li a2, 1 ; and second number = 1
li a3, 0 ; Useless, but we'll store the result in a3
li a5, 1000 ; Set our upper limit
fib:
nop ; the next value in the sequence is
add a3, a1, a2 ; the first number + the second number
mv a1, a2 ; new first number is the old second one
nop
mv a2, a3 ; new second number is the computed value
bltu a3, a5, fib ; loop if we are not yet at 1000
nop
nop
nop
_end:
j _end ; endless loop
Maybe there are too many nops, but who cares? It runs... And soon, it will run without all those pesky nops!