Emulation is a tough topic to begin with. At least it has been to me when I first wanted to understand the ins and outs of it, until I could manage to make a good emulator for the Gameboy. It’s actually not so difficult to find information on the Internet but it becomes more challenging to get thorough articles about what I like to call “the whole picture”. In my opinion, this thread on stackoverflow had an answer that let you grasp quickly most of the issues you’ll have to overcome when writing an emulator. It does not dive much into the actual making of the program but it shows that the subject is not an easy one.
My first emulated console was the CHIP-8, that I made by following a tutorial explaining all the details of the implementation. This system is really the one you should start with if you want to get into the basics of the emulation process, before taking a more advanced one like the Gameboy. It just has 35 instructions and the screen and input are quite basic so it’s not a problem to implement if you have a minimal documentation (like the Wikipedia page).
What is it all about?
At the highest level, console emulation is quite easy to understand. The purpose is to create a system that, provided a game file, is able to reproduce the behavior of the original system for the player. This includes rendering the screen, the sound, processing the inputs of the player so he can actually play the game as if the computer was the console. The key point is that the emulated devices can be in any form you want: the input, originally a gamepad, can be represented with your keyboard, network capability (if any) may be simulated locally or over the Internet, etc. It does not matter how it works, the only important thing is that the system’s features can be used in the game you’re playing.
As I said, once the emulator is created, it only needs a game file (called ROM) to start running. That’s logical: when you start a game console, you always insert a cartridge or disc that embeds the game you want to play with. So the ROM is just a binary file that contains the data that was on your original disc/cartridge, with all the instructions and resources the console needs to play the game from start to end.
Low level execution
Before we go deeper, you need to have some understanding of the assembly language. An assembly program is a line of CPU instructions, and each one may be associated with some data. The instruction set depends on the CPU that is used: the more complex the CPU, the more instructions will be available. Each instruction is represented by a number, the opcode, which is one byte long on the simplest systems, so that makes up to 256 possible instructions. Of course, there are more instructions on modern processors so their length is also bigger. When the CPU reads an opcode, it decodes it and executes the instruction. If the instruction needs some data, for instance if it is an arithmetic operation, the CPU reads the next byte(s) after the instruction which must contain this data. Once the operation is done, the processor reads the next opcode (located just after the data for the previous instruction) and the cycle continues indefinitely.
To keep track of its current state (where is the next opcode to read, what is the result of the last operation, etc.), the CPU as a few memory cells called registers. In the cycle we just described, an important register to understand is the program counter. This registry holds the location of the next opcode to execute when the current instruction is finished. It is incremented at each cycle by the length of the opcode plus the length of the data for the instruction, so the CPU knows where is the next opcode to read. Of course, an instruction may modify the program counter to point to an arbitrary address in memory, that is the case during a function call of when dealing with conditions. Another important register is the stack pointer, which is used for nested function calls: when you return from a function call, the CPU must know from where this function was called so the program counter can be set to the instruction just after this call and the CPU will resume the execution of the parent function. The CPU also holds a few set of registers that are used to keep track of the data recently processed or soon to be processed. Most of the instructions results will go in some specific register for later use.
Along with the registers, the CPU is also able to access to the main system’s memory (the RAM), from which it can read some data and push it into some registers, or write a register’s data into the RAM. Once again, there are some dedicated instructions in the CPU to write or read from the RAM.
So how does the emulated CPU work? Well, the same way as the original one! The emulated CPU can be represented by a set of variables holding the registers, and a set of functions behaving like the original instructions. The CPU will read the ROM file byte after byte, each time calling the function corresponding to the current opcode that was fetched. The registers will be read and updated according to the needs of the executed instruction. In its simplest form, the RAM may be represented with an array, each value representing a byte of the emulated system’s RAM, the CPU being able to read and/or modify each one of them.
Along with the CPU, a console system often provides some user devices like a gamepad, a graphical output system (the screen on a portable system, or just an HDMI output for modern TV systems), a sound output system (like speakers or jack output), a networking interface (like the link cable on the Gameboy, Wi-fi or Ethernet ports for recent consoles), and maybe others. Each of the devices also needs to be emulated, because they have a specific behavior on the original system that you must replicate. For instance, the Gameboy’s screen supports four shades of gray, so you must program your emulator in order to read one of these four shades in the video data it will be provided with. The Gameboy screen also informs the CPU that it is done rendering a frame, that’s to be emulated too!
Each system being unique, you need to have a specific documentation in order to be able to emulate every part of it. That’s where one of the biggest difficulties arises. Consoles constructors protect the specifications of their system and it’s quite difficult to access good documentation for them. It’s possible to do some reverse engineering to understand how a system work, but it’s a complicated process that takes quite some time. Luckily, some popular devices became quite well documented, and that’s the case for the two I spoke about here, the CHIP-8 and the Gameboy. It’s often useful to compare the data across multiple websites and sometimes to guess what is not explicitly said, but it’s doable, otherwise there wouldn’t exist any emulators!
There is no real limitation in the technology (programming language, devices, etc.) you may use to build an emulator, as long as you can provide the inputs and the outputs we discussed above. Of course, the fastest your program is, the more complicated system you’ll be able to emulate. But for starters, it’s better to use a technology you’re familiar with, because you will mostly not encounter performance limitations with simple systems if you do things well.
This brings me to a point that was not dealt with and that was an enigma for me at the beginning: time. Old consoles used to run on CPU of a few MHz or less, but today we have machines that are thousands of times more powerful. So the instructions of a game will probably execute much faster than on the original system, aren’t they? In that case, the game would run a lot faster, but you don’t always want that (one obvious drawback is that the sound would be awful). I will not dive into the details here, but to overcome this issue you will need to keep trace of the emulated time that has elapsed when executing some instruction, and regularly pause your emulation system so it won’t run too many instructions in a given time frame. The way you do this depends on the technology you’re using but here also, it is not so hard to implement once you understand the issue.
Emulation is complicated piece of programming but it’s also a passionating one for gamers, especially when you fall back into your childhood’s Super Mario. It’s also, in my opinion, an interesting way to understand the inside of a computer at a very low level.