The story of the impossible port: How Quake was ported to the Game Boy Advance
The Game Boy Advance is a handheld games console created by Nintendo. It was released in Japan in 2001 and served as the successor to the Game Boy Color. It had an ARM7TDMI clocked at 16.78 MHz, 32kb of internal work RAM, 256kb of external RAM, and 96kb of VRAM. It’s not the most powerful machine, but there are plenty of games for the handheld that many hold in fond memory. One game that never saw the light of day though for the device was a prototype port of Quake, a game developed by id Software that helped define the first-person shooter genre that we know today.
Quake is an incredibly detailed game with a fantastic soundtrack and addictive gameplay, and just like DOOM, it’s been ported to practically every single device you can think of. Its port to the Game Boy Advance is particularly incredible as it does not natively support 3D graphics, and Nintendo specifically marketed the handheld as being a two-dimensional gameplay experience. That didn’t stop Randy Linden from developing his own port though.
If you’re unfamiliar with Linden, he’s best known for being the developer of both bleem! (a PlayStation emulator) and the SNES port of DOOM, an accomplishment that John Romero, co-founder of id Software, once said in an interview with Shacknews he didn’t think was possible. Linden’s development proficiency proved that if anyone was going to be able to make Quake on the Game Boy Advance a reality, it was probably him.
This port has come to light thanks to Linden’s own release of it through the Forest of Illusion project. Forest of Illusion is a project aimed at preserving the history of Nintendo’s games, and Linden reached out in order to distribute the copy of the Quake port he found on a 256MB flash card in his possession.
We would like to thank Randy Linden for dedicating time to answering our questions and ensuring the technical accuracy of this article. We would also like to thank Modern Vintage Gamer for allowing us to use any stills from his video that were needed. This port has no official relation to id Software or ZeniMax and was developed as a solo project by Linden.
Quake’s Game Boy Advance port
Technically speaking, it’s a marvel that Quake can even run to the level that it does on the Game Boy Advance. It runs at a good frame rate and maintains the correct lighting and color palette of the original Quake game. Everything is 3D, including weapons and monsters. Games on the Game Boy Advance achieved 3D graphics typically through sprites, but this was the real deal. It doesn’t make use of ray casting in the way other 3D games did on the handheld, and it even achieves point lighting effects on pre-rendered objects via a pallete-changing trick to achieve an illusion of dynamic lighting.
To be clear, this port isn’t the full game, and it’s a prototype that Linden intended on taking to id Software once it was completed to be made for release. However, the Game Boy Advance’s popularity began to wane, and instead, the custom engine written by Linden later became the engine of another game developed by Linden entirely — Cyboid. Linden tells us that a “large chunk of the code” is still the original ARM code from the Game Boy Advance version. If you want to try out Cyboid, an older version is available on the Google Play Store, but the official APK is now distributed on the Amazon App Store as the game has a lot of low-level 32-bit code.
Linden also shared with us a video of his code running on the iPod Video, which served to be one of the earliest versions of Cyboid. It was built on the same engine code that was used for his Quake port to the Game Boy Advance.
The Game Boy Advance port of Quake doesn’t contain any of the game’s official assets, as Linden hasn’t reached out to either id Software or ZeniMax about distributing the E1M1 version which contains official Quake assets.
The game currently being distributed is also a debug build. Holding the R key on boot-up will bring the player straight to the second map of the game, and holding left on the D-pad will bring them to the third. Map swapping can also be accessed when the player dies, and monsters will not attack the player until the player shoots at them first.
As for music, the demo makes use of public .S3M files and the sound mixer handles both stereo music and sound effects.
There were a number of boundaries when it came to the Game Boy Advance that made this a difficult port. Some of the biggest obstacles were the low clock speed, the lack of 3D graphics capabilities of the handheld, and the lack of a floating-point unit (FPU). There were plenty of others along the way, but these were particular pain points that Linden outlined to me as being problematic. Before we get into it, it’s important to understand the layout of the Game Boy Advance.
The Game Boy Advance has three sets of RAM — one is the internal work RAM (IWRAM), another is the external work RAM (EWRAM), and the third is video RAM (VRAM). The 32kb of IWRAM is used for storing ARM instructions for quick execution, whereas the 256kb of EWRAM is optimal for storing Thumb-only instructions and smaller chunks of data. As Rodrigo Copetti notes, EWRAM can be up to six times slower to access than IWRAM. The majority of memory in the form of EWRAM is only accessible via a 16-bit bus, despite the Game Boy Advance being marketed as a 32-bit handheld. The IWRAM could be accessed via a 32-bit bus. VRAM on the Game Boy Advance comes in at 96kb, and while it’s primarily for storing graphics data, it’s found in the CPU’s memory map and can be used as normal memory storage, too.
Thumb instructions are a subset of 32-bit ARM instructions, and are a set of instructions encoded into 16-bit words. They have all the benefits of 32-bit instructions without taking up as much space, making them efficient for optimized development. This means that while EWRAM is slower to access, Thumb instructions being efficient can often still end up just as fast as ARM instructions stored in IWRAM, though the downside of Thumb instructions is that sometimes there isn’t quite the Thumb equivalent of an ARM instruction you want to execute. The EWRAM was used for storing the output of the 3D math transformation logic which was basically the list of polygon edges that were then traced out scanline-by-scanline by the rasterization code.
As Linden tells me, the most complex and difficult part of the entire port was the scanline renderer. It consists of over 10,000 lines of highly-optimized ARM assembly code which is designed to draw a set of pixels to VRAM. The scanline renderer used up most of the 32kb IWRAM. The edges closest to the camera are active and rendered, and it’s essentially a large Binary Space Partitioning (BSP) tree. VRAM was used to store the results of the polygonal transformation output into edge tables because there wasn’t enough IWRAM, but VRAM on the Game Boy Advance is still faster than EWRAM. The graphics were also stored and displayed here.
He spent a lot of time focusing on optimizations to ensure that it was able to obtain the fastest execution time possible. Three things that he did to speed up that execution time included the following:
- Self-modified the code before it was executed, so fewer instructions were required
- Used a series of look-up tables for things like reciprocal, sine, cosine, tangent, etc.
- Switched the CPU “mode” to gain access to additional registers (that are like “variables”) without having to save and restore the registers’ values.
Switching the CPU modes to gain additional registers is an incredibly clever maneuver that allows quick access to values close to the CPU so that they can be retrieved in a single clock cycle. As Linden tells me, it was possible to switch registers and retrieve a value in one clock cycle, as opposed to storing a value in the RAM of the Game Boy Advance, which takes longer. The CPU itself is a 16.78 MHz processor, meaning it can complete 16780000 cycles per second. That sounds like a lot, but when you need to calculate and draw every pixel on the screen, those quickly add up and it becomes important to shave as many operations off as you can.
The above is the list of general registers of the ARM7TDMI chipset that’s inside the Game Boy Advance. Typically, developers would only ever access the registers within the “System and User” mode and resort to using normal variables outside of that. However, he made use of registers in all seven modes of the chipset, and the best part about it is that switching modes still retain the values in the registers of the other modes, so he could switch between them.
Funnily enough, Linden also mentioned how his method of bank switching unearthed a bug in the Nanoboy Advance emulator. As it turned out, that emulator did not support using the other modes of the CPU for saving in registers and switching, and his Quake demo was the first known game to actually do it.
Linden shared a photo with us of some of the notes he created and explained how he optimized his floating-point calculations in absence of a proper FPU.
The above image is one that Linden shared with us from his notes, and what’s particularly interesting is the “miscellaneous ARM cycle instruction counts”. He devised a way to optimize the cycles for calculations so that he could reduce the number of clock cycles for a calculation. As he described it to me, an 8-bit number could be multiplied in one clock cycle, a 16-bit number in two clock cycles, a 32-bit number in three clock cycles, and a 64-bit number in four clock cycles.
“There were two or three stages of execution [in the ARM processor]. Say for example I multiply register one by register two and put the result into register three. If I knew that register two was a 16-bit number instead of saying multiply register one by register two, I would flip it and I would say multiply register two by register one because that would save me a clock cycle.”
He told me that the reason he did this was to squeeze every bit of performance out of the Game Boy Advance, as a clock cycle saved here and there really adds up when a lot of calculations are being performed. As for the self-modifying code, I asked Linden to explain it.
“The program comes from [storage], it transfers a big block of the program into internal RAM for execution because it’s faster. Each RAM access is much, much slower so I do a DMA [Direct Memory Access] of a big block from ROM into RAM, and then I change the actual program code. For example, ARM has the ability to shift operands left or right or it can mask off certain bits as part of the instruction set. The instruction specifies which bits you’re going to mask or how many bits you’re going to shift by. So, I would generate code that would modify what was just about to be executed based on how many bits I needed to shift. Another example is with regards to 3D matrix multiplication. There are a whole bunch of multiplications involved there. I would generate the actual instructions that are doing the multiplications into the internal RAM and then execute them so the code sort of built portions of itself while it was running.”
Self-modifying code has its own downsides, in particular when it comes to debugging. It removes the need for branch instructions too, where the code would jump to another execution sequence and can deprive the main thread of precious computation time. Linden also told us that the look-up tables are perfectly aligned in the ROM so that they are a perfect multiple of an eight-bit value shifted left. The size of the look-up table is immense and doesn’t fit into RAM, and the alignment also avoids the need for an extra load instruction to get the base address of the table.
All in all, the final prototype was developed over nearly two years.
The future of Randy Linden’s Quake port
I asked Linden what would happen to the future of the Quake port, and he told me that he was putting consideration into asking ZeniMax and id Software about releasing the version with official Quake assets. He also told me at some point he will release the source code, but currently, it doesn’t build as it requires an older computer.
I asked Linden why he chose Quake, and he told me that he loved the game and he loved the challenge of this being the “impossible project”, as it was off the back of his DOOM for SNES port. He also mentioned that while he does not believe the entire game could have been ported due to space constraints, the vast majority of the game could have been in the same engine.
If you’re interested in checking out Quake for the Game Boy Advance, be sure to check out the release of it on Forest of Illusion, which you can check out below.