How Does a CPU Work - CHIP-8 Emulation From Scratch

Before we can talk about emulators, we first need a basic understanding of how a CPU works.

A CPU reads instructions from somewhere in memory that tell it what to do. A CPU may be really fast but it’s stupid. You have to be very explicit and logical to get it to do what you want and you do that using a discrete set of instructions.

For example, consider this instruction for the CPU inside of the original Game Boy: $C622. It encodes an operation and relevant data into a number that a machine can read.

Different instructions are encoded in different ways. In this case, the first byte ($C6) acts as the "Opcode" (operation code), and the second byte ($22) is the "Operand" (the data being acted upon).

To find this instruction in the standard GB Opcode Table, you treat the first byte as a set of coordinates:

The Row (): Look down the left side of the table for the row labeled Cx.
The Column (): Look across the top for the column labeled x6.

Where these two intersect, you will find the instruction ADD A, n8.

Breaking Down the Components

The Mnemonic (ADD A, n8): This tells the programmer that the CPU will add an 8-bit "immediate" value (n8) to Register A (the Accumulator).
The Immediate Value ($22): Because the table lists this instruction as being 2 bytes long, the CPU knows that the very next byte in memory—in this case, $22—is the specific value that needs to be added.
Execution: When the CPU processes $C622, it pulls the current value from Register A, adds $22 to it and stores the result back in Register A. It also updates the CPU's internal flags (Z, N, H, C) to indicate if the result was zero or if the math caused a "carry" over the register's limit.

As explained above, the CHIP-8 was not a real physical CPU but instead a virtual machine with its own instruction set, but the same principles apply. For example, consider the CHIP-8 instruction $7522.

The first byte ($75) says ADD to Register 5 and the second byte ($22) is the value to be added to Register 5. So this instruction says ADD $22 to Register 5.

$ and 0x mean that a number is represented in hexadecimal (hex). That is the primary radix used when dealing with computers at a low-level because it's a much more concise way of presenting large values (like memory addresses).

Way back in the day, programmers would write programs in an assembly language rather than the high-level languages (like C++) that we use today. Assembly is as low as you can go while still being human readable. An assembler would translate their human-readable assembly into the 1s and 0s that the computer could understand.

Keeping with the earlier example, the assembly program would have ADD V5, $22, and the assembler would translate that to $7522, which the CHIP-8 interpreter can read. The same thing happens today, only we have another layer above assembly in the form of high-level languages.