Overlay Corruption

Occasionally, we see transition corruptions that create holes at seemingly random places within the current room. When you first encounter this, it sounds like an unpredictable and nonsensical consequence, but once you understand the underlying mechanics, this outcome quickly becomes the most intuitive answer.

Stars in his eyes

We can see why the creation of holes is obvious if we look at the most basic consequence of these transition corruptions: When transitioning in the underworld whether between rooms or within them, the dungeon submodule is changed by incrementing the address—from $00 to $01 (intraroom) or $02 (interroom)—rather than writing the value directly. This allows us to combine those changes, putting the value at $03: the star switch overlay submodule.

It should have hit you just now that hole creation is an obvious consequence of these glitches. Stepping on a star tile creates and/or removes holes from the room, using what is called an overlay (hence the name of this Explication). But we'll need to dive deeper to understand the specifics of what's happening and why the location of these holes is actually very predictable.

The Write Stripes

Unlike the overworld, rooms in the underworld are not stored as full tile maps in ROM. Everything, including the walls and doors, is created with an object that is identified with a number and given properties to denote its position and size. The basic wall layouts are split into 8 different packages; doors, which are simpler and handled separately, have fewer properties, but all these are still created as objects in much the same way.

Upon loading a room, the two things we need to pay attention to are the 24-bit data pointer at $B7 and the stripes buffer at $00:1100.

The 24-bit data pointer is self-explanatory: it's a 3 byte pointer to the room's data in ROM.

The stripes buffer is a large block of memory reserved for a format Nintendo used for chunked VRAM updates. A stripe is just a small block of data that designates a size, address, direction, and format to write a subsequent block of data. The address identifies the location in video memory (the VRAM address) where data should be written, and the direction identifies whether it will behave as a horizontal change to a tile map or a vertical one. The size is the number of bytes written, and the format decides whether a single value should be written that many times (run-length encoding, RLE) or if the data is an explicit series of values.

Holes and floors generated during an overlay change are handled exactly the same way as the rest of the room. They're all objects from a list that modify the tile map and build stripes to update VRAM.

So why does the overlay submodule break when we reach it through unintended means?

Foot on the trigger

The missing piece is hopefully obvious: it's the star tiles. By itself, the star tile doesn't do anything particularly interesting; however, with a corresponding room tag, it can trigger the holes overlay submodule. The room tag overseeing star tiles (and chests, such as the Skull Woods compass), identify the specific overlay to draw when the trigger is hit. There are 19 hole overlays in all, with 2 unpaired and 8 paired (and 2 of the tags operate overlapping pairs).

When triggered, two imporant actions happen to initialize the submodule: an address for indexing data is reset to $00, and an address identifying the overlay index is updated. On the next frame, the overlay submodule is in control.

This submodule has some odd, likely vestigial functionality: despite being handled in a single frame of gameplay, the routine that applies the overlay accounts for an already set data index. In theory, this gives it the ability to handle overlays split up across multiple frames of gameplay, but, in practice, it only acts as the root cause of our glitch.

When the overlay application routine is entered, it checks $BA for the index used with the 24-bit data pointer. If it's zero, it uses the overlay index to find the appropriate ROM address for the overlay's data. Then, it updates a tilemap buffer in WRAM at $7E:2000 with the new data.

If it's not zero, it continues as is, using the index it has just read and whatever pointer is already located in $B7. If we enter the overlay submodule not via an intended trigger, we're pretty much guaranteed to have a nonzero index.

Where did I come from?

With that in mind, it's fairly predictable where we get our data from. In the case of most freshly loaded rooms, our pointer and index will be inherited from the room draw routine. As an example, let's look at the entrance to Desert Palace.

When we enter the room, and it's finished loading, $B7 points to $1F:8BC2. This is the start of the next room's data. That makes sense; we just got to the end of this room's data.

Our index, $BA, has ended up with the value $0120. It really likes landing on this value, mostly because the address is reused to index block and torch data, but that's not the only value we can see it be.

If we were to enter the overlay module now, the first data we read will be at $1F:8CE2. This is misaligned with the proper data due to the object array header, but, nonetheless, it is interpreted as a list of objects intended to only be holes and floors. This loop keeps going until it encounters the sentinel (terminating) value $FFFF. For Desert Lobby, this happens to terminate quickly. At $1F:8D77, there are four consecutive $FF values, making it impossible to not read the sentinel.

Notably, the address $B7 is also used to control palette fading, contributing to different values depending on how you entered the room. When a fade is used—from dark rooms, stairs, or falling—it always seems to settle at $E8BC. This means we can broadly create three categories for fade transitions based on which ROM bank the room's data is in.

To decide what tiles to modify and redraw with stripes, each object in the list is passed to a routine that builds stripes for a VRAM update based on the tilemap buffer. Following that, there's a routine that looks at the 4-by-4 tile area at the object's coordinates and modifies the tile properties of those tiles in another table. When modifying this property table, there are only two options: hole or floor. To determine which, the corresponding entry in the tilemap buffer is looked at, and the value is masked with $03FE. If the result is $00EE or $00FE (corresponding to the type 0 floor of the tileset), then the new tile is a floor; otherwise, it is a hole.

If we've entered this submodule incorrectly, the necessary tilemap update will never have happened. As such, all standard floor tiles will remain floor tiles, and anything else caught in the boundaries of one of these new "hole" objects will be turned into a pit.

A step too far

The sentinel value mentioned above not being encountered quickly enough is what causes the most dangerous results. This buffer is only allotted about 2 kilobytes; the memory following it is used for other stuff. For every 3 bytes read from ROM, 48 are written to RAM. It only takes around 45 bytes to overflow the buffer. It's not difficult to find that many bytes to read. What halts us most of the time is when there's only a single layer in a room. This puts the $FFFF that signifies the end of layer 2 to immediately follow the one signifying the end of layer 1. Four bytes in a row makes the sentinel impossible to avoid if we advance 3 bytes at a time.

There's actually not much interesting that lies beyond this large buffer. Textbox control is here, but that's always reinitialized when a new textbox is called up. Other reconfigured addresses include sprite property caches and polyhedral controls. There are a few arrays related to doors that might do something interesting, but perhaps most famously: the mirror's coordinates. Well, most famous of the useful stuff. It's when we go even further that the wonky problems happens.

Importantly to these problems is also the fact that the stripes buffer is accessed outside of bank 7E. The first 8 kilobytes of each bank we're working in mirror the first 8 kilobytes of bank 7E. The remainder of the bank is not WRAM. Page $20 is open bus, but page $21 contains registers used to communicate with the PPU.

While we call these broken graphics "VRAM corruption", for the most part, it is not actually a problem with VRAM. The real problem is that the registers telling the PPU where to find tilemap and character data for the backgrounds and sprites have been written to with garbage values.

Many registers are rewritten every frame from values saved in WRAM, but these tilemap and character registers are only written when the game boots up. There are only a few of these registers, and they're expected to take on specific values; so writing a fix is trivial and simple. The reason it looks like an unrecoverable mess is because these different types of data are of completely different formats.

If we keep going, we may reach CPU registers on page $41, most dangerously the NMI and IRQ enable flags or the DMA and HDMA enable registers. For the latter, most of the writes will simply not work, unless an earlier unintended write happened to enable force blank. The NMI and IRQ are what really mess things up out of these registers. NMI is what keeps the game running at 60 frames per seconds, and it's what updates everything on the screen. IRQ is used for precise timing within a frame, but it's only intended for the triforce and crystal animations, and scrolling the background during the credits.

Depending on your console, the DMA activation can cause a hardware crash. Certain revisions of the SNES do not behave well if an HDMA is triggered on the same cycle as a DMA. This can occur even with corruption that doesn't find its way this far. And despite claims of accuracy that exceeds software emulators, the SuperNT can't seem to handle these writes and crashes during overlay corruption fairly often (it probably uses the space for its own interfaces). All modern software emulators handle open bus fine.

If we reached page $41, we're usually screwed, but there's nothing that stops the routine from just going further and further. The next 16 kilobytes are harmless open bus writes. Following that, 32 kilobytes of ROM code. Writing here does nothing; that's the "RO" in "ROM". But, that isn't very important, because the set up writes to this buffer can't seem to reach this far in practice. Every bank containing data we reasonably expect to be read has at least four consecutive $FF bytes at the end, and many places before.

As an example of some really bad behavior, let's look at overlay corruption in Uncle Passage. This one goes really far and ends up writing to CPU registers. The first one it writes is a $15 to $4200. This is the interrupt enable register, and the write disables NMI and enables IRQ to trigger at certain horizontal positions. The next write to take note of is the $3C to $4207 and the subsequent $22 to $4208. This sets the aforementioned horizontal IRQ trigger to 55. The vertical trigger is eventually set as well, but we don't have that trigger enabled, so it doesn't matter. Immediately after the IRQ horizontal trigger write, there's a $15 sent to $420C, the HDMA enable field. Channels 0, 2, and 4 are now running HDMA with garbage values. None of these channels are expected to handle HDMA in this game, only channels 6 and 7. And to make matters worse, a $08 sent to $420B enables channel 3 for a DMA transfer. The most devastating part of this is that because this channel is guaranteed to have completed a transfer on the previous frame, it has a transfer size of $0000, which actually behaves as a 65,536 byte write. A long while later, we'll also write new garbage to the HDMA properties, changing where and when they write.

The end result of all this is that the CPU is stalled at the end of a frame of game play, waiting for an interrupt that will never occur. Instead, it gets a different interrupt firing off 262 times every frame—that's over 15,000 times every second. The IRQ routine eats up about 25% of every scanline just to do nothing but exit. Half the screen is black, because channel 4 is enabling force blank there; only by pure coincidence does it make the screen visible again at the top of the frame. Channels 2 and 4 are writing to the beam position registers. As those are read only, these channels are doing absolutely nothing of effect.

You can see why really long writes easily bring the game to a halt.

But this is still only half of the story.

Remember the reason we're even writing to this buffer: we're setting up a bunch of transfers to VRAM. These transfers are built from the tile map that wasn't updated like it should have been in the beginning. Data is copied verbatim from there, so while we're writing a bunch of blocks of graphics data, they're exactly what's already there. As such, it's mostly fine, but once we hit open bus during the reading, most of the writes will be 17 byte transfers to the same address in VRAM, due to the nature of open bus. Larger, more devastating writes can potentially occur when we read certain registers.

Like the object data, this buffer expects a sentinel $FFFF. If we've hit open bus, we won't stop until we reach $00:4307. None of the reads in between make any difference as to whether we'll stop or not—it's guaranteed we will; even if that HDMA register somehow fails, we still have null bytes at $00:89C2, a ROM address, to save us (we hope); only it would take much longer. The difference between reading and writing means we can easily read garbage much further than we can write garbage. When reading openbus, the value that comes out the other end is determined by the last byte used by the A bus. In this case, the last byte the A bus uses is the $11 that forms the high byte of the address of the operand of the instruction of the read of open bus. What saves us is that $4307 and $4308 are HDMA registers for channel 0 which are initialized to $FF. This channel is never usesd for HDMA by this game, so those initialization values should stay put. We will not see our sentinel between $2000 and $4306, so we're guaranteed a good time reading. Writing to this address range can be stopped at any time, assuming the ROM allows it.

This is where most of the actual VRAM corruption occurs, but it's pretty temporary. Mirroring or entering the underworld will fix most of the graphics and tile maps, making things look normal again. If background 3 tiles (used for the HUD and text) get corrupted, it can last longer, but those are still eventually reinitialized before, during, and after file select and on death. The only truly permanent damage that requires power cycling the console is that endured by the character and tile map pointers.

Fake mirror

The most interesting thing we can look at is the mirror's coordinates. The fun thing about this portal is that it always exists on the overworld, even if you don't have the mirror. There's no flag to disable the mirror portal, just a special case for coordinate {0,0}. But even there, the sprite exists and is functional. If you turn on out-of-bounds mode and walk into the northwest corner of Lost Woods, you'll trigger a warp.

The behavior of overlay buffer writes is always the same, which helps us narrow down where the coordinates come from:

The guaranteed consistency of the X high byte means that the mirror portal will always end up in the same column of screens as Link's house. The low bytes of each coordinate are less important, only determining the precise position. But they're only less important because of the inconsistent and relatively unpredictable nature of the Y high byte. Any value higher than $0F is off the map. If the value is random (it's not, but we'll assume it is to illustrate a point), the portal is only in bounds 1/16 of the time.

There also exists a small caveat in positions due to the palette allocation: at least one of bits 3 and 4 must be set. This is because bits 2, 3, and 4 determine the palette of a tile, and the space for palettes 0 and 1 are used by the HUD. Thus, the smallest coordinate that can normally be had is $08. The one exception to this is in rooms with a transparent floor. With those floors, the palette doesn't matter, so it just uses palette 0. That and the character name of those floor tiles give an exception to the minimum: $01. But that's it. There's nothing in between. On a similar note, these bytes can never have bit 1 set. Tile names are 10 bits in length, but this game only uses half of that space.

For the Y high byte, this puts the portal's furthest possible position north just inside the Hyrule Castle courtyard. Or with transparent floors, just below the Tower of Hera. The character name limitation means we'll never see values of $0A or $0B (Link's house); or $0E or $0F (east of dam). The only screens we will see an in-bounds mirror portal on are: Tower of Hera ($01), Hyrule Castle ($08 or $09), and south of Link's house ($0C or $0D).

Summary

Overlay corruption occurs as an unintended entry into the overlay submodule via various transition corruptions. The location of the holes comes from misaligned room data or unrelated code and data. Per room, these holes are consistent, but the exact effect is also influenced by fading transitons and previous overlay changes within the same room.

Most corrupted WRAM data is uninteresting or reinitialized before it's needed. The only effect of note is moving the coordinates of the mirror portal, but with very limited application. More often than not, the portal is just moved to an unreachable location off the map.

"VRAM corruption" is primarily due to garbage writes to registers that tell the video chip where to find character and tilemap data. These pointers are only written when the game boots up, so they stay broken until a hard reset. Most of the actual corruption in video memory is cleaned up after transitions.

Game crashes occur when CPU control registers are written to with garbage. If corruption reaches that far, it's guaranteed to cause problems.

Some people with older consoles have problems with crashing in general, because of a hardware glitch that occurs when DMA and HDMA are running at the same time. These crashes can occur even in cases where a newer console would remain stable.

Bombastic FPGA-based emulators fail to live up to their claims of accuracy and often crash, because they weren't programmed correctly to deal with undefined open bus behaviors.