Overlay Corruption

Occasionally, we see transition corruptions that create invisible holes at random places within the current room. When reduced to a description like that, this doesn't make any sense. Why holes? Why are they invisible? What's with their shape? And why does my screen look like the background of this Explication?

Despite their complexity, these consequences are actually very intuitive, and it was a single bad value that propagated into this mess.

Stars in his eyes

The reason for the holes will make you say "Well, obviously!" once you know that this all begins with a single write of $03 to the dungeon submodule ID. The default submodule where the player has full control is submodule $00. Submodule $01 is for intraroom transitions and $02 for interroom transitions. The key piece of information we need to know about these transition submodules is that they are set by incrementing the current submodule once or twice, respectively. The assumption is that a transition can only be triggered when the player has control, so the current submodule should be $00, but, surprise!, we can trigger these increments with a nonzero submodule.

The glitch we're discussing here is a consequence of triggering both transitions at once, which gives you either 0+1+2 or 0+2+1. Either way you get a value of $03 (the order of the operands can matter, but it's not relevant to this discussion). Which submodule is that?

Star tiles!

Or, more precisely, it's the hole overlay submodule that is mostly triggered by stepping on active star tiles, as well as by certain chests to create trap hazards upon opening.

Foot on the trigger

Star tiles (and chests) are the intended triggers for a holes overlay; although, you could trigger them with a normal floor switch if there were any rooms with both a holes overlay and such a switch (there are not). By themselves, these floor switches don't actually do anything besides sitting there looking fancy. All interaction with them is handled by the room tag controlling the overlay, but all we need to know is that once this interaction is detected, three critical things happen: the submodule is changed to $03, the overlay index variable at $04BA is set to the ID to use, and a 16-bit value at address $BA is set to $0000. This address is really important, so it'll be referred to as RDOBJX for "ReaD OBJect indeX" from now on.

There are 19 hole overlays in all, with 2 unpaired and 8 paired; 2 of the tags have overlapping pairs. Which overlay to use is hardcoded to the specific room tag being used, each of which just passes the ID of its overlay to a shared routine.

Now that the overlay is triggered, the new submodule will come into action on the next frame.

This submodule has some odd, likely vestigial behavior: despite overlays being handled during a single frame of gameplay, RDOBJX is checked for a nonzero value, and it skips over some prep work if it is not exactly $0000. This seems to give the submodule the ability to handle overlays whose properties would be initialized beforehand, and it was perhaps even planned to handle multiple frames of animation. In practice, this check only acts as the root cause of our corruption.

Star citizen

First, let's look at the intended case: when RDOBJX is $0000. The overlay index at $04BA is used to find a 24-bit pointer which gets stored in $B7 (we'll call this RMOBJPTR for "RooM OBJects PoinTeR"). This pointer and RDOBJX are used together in the subroutine that is called next to draw either a 4x4 hole or a 4x4 solid floor to the room's tilemap buffer. These tile updates are more or less hardcoded; that is, they don't attempt to run the same code used to draw holes and floors that exist by default when a room is loaded. There are two options: hole, or floor. Anything that isn't a hole is a floor.

This buffer is nothing more than that: a buffer. There's no automatic handling of it, and it's not visible to the video chip (the PPU). For a change to make its way to VRAM for display, it needs to be done with a manual call to a routine that can perform the transfer. Or the data could be moved to a different buffer that is handled automatically. And actually, tilemaps are 8 kilobytes per background. That's a lot to handle at once, especially when the actual changes only account for maybe 500 bytes. The smarter thing to do is divide the actual changes into smaller chunks and queue those up for transfer. This is exactly what the overlay code does.

Once the tilemap changes are finished, RDOBJX is reset to $0000 along with an unused variable. Immediately after that is where an uninitialized index skips to—i.e., the tilemap buffer update is skipped—so we should start considering the consequences of not updating the tilemap.

In both this loop and the previous, RDOBJX and RMOBJPTR are used to read through a list of objects defined by 3 byte entries. This end of the list is indicated with a sentinel (terminator) value of $FFFF. The base address of this list is stored in RMOBJPTR, and RDOBJX indicates how far into that list the current object being parsed is. The data is in the same format as room objects in a room for loading, but, as already noted, it is manually parsed to obtain a tilemap location and object type, with the latter check being a binary decision.

To apply the actual overlay, the object list is parsed again to find the location of each object. That location is used to modify the collision map as well as create small chunks of tilemap updates. These chunks just read the existing tilemap and create 4 chunks of 4 tiles each for every hole or floor. This relies on the previous routines (or something else) having already updated the buffer. With overlay corruption, that update never happened; the updates will change tiles to what they already are. Even with a corrupted value, nothing outside of the tilemap will ever be read. The way position is encoded in object data doesn't leave any room for out of bounds values, although, objects positioned in the bottom three rows will bleed into the tilemap for the lower layer. Oh, yeah, overlays only ever apply to the upper layer!

Like the tilemap buffer changes, the collision changes are a binary selection: it's either a hole or a floor; although, in this case, the default is being a floor, just because that's easier to check for. As such, any tile that doesn't correspond to the base floor tile will be given pit collision. An intended trigger will have just used this same list to update the tilemap buffer and thus will produce collision that matches the graphics, but if we happened to skip that, funny things happen.

Again, this is a binary choice. Hole or floor. Pick one. Any tile caught in the 4x4 square of tiles at the current coordinates will have its collision changed to one of those. The base floor becomes walkable and everything else becomes a pit. Was there a pot? It's now a pit. A wall? Pit. The right half of a big chest? PIT!

This explains why pits appear, why they're not visible on the screen, and even why they can be a nonsquare shape, but it doesn't explain why they appear where they do.

Where did I come from?

It is time to discuss RDOBJX and how much it breaks the overlay parsing. It and RMOBJPTR are not used exclusively for overlays, so they will likely have some unrelated value when an overlay is triggered improperly.

Whatever garbage is already there will be taken at face value. No matter what data (or code!) is being read, it will be treated as a list of room objects. With overlay corruption, only bits 2 through 7 and 10 through 15 are relevant. These pairs of 6 bits encode the X and Y positions, respectively, as a value from 0 to 31. But if we're reading random garbage, these bits will essentially be random, and thus the positions of the objects will be random.

The garbage is not random though! The values left behind in those variables are very predictable, and, if we're careful, we can track changes to them in real time. We don't need to know their precise values, but we can associate actions we take with sets of positions.

First, we can consider the "standard" case: entering a very unspecial room and then triggering overlay corruption. In this scenario, the room has just finished loading, so RMOBJPTR points to the current room's data in ROM, and RDOBJX will be $0120. It really likes this value, because that's the size of the torches data. After a room is loaded, it searches for torches and adds them manually. Not really sure why this is done this way…but it means torches are the highest priority objects. It also means that any room with lightable torches will have a different value for RDOBJX, but that's not really important to know. With or without torches, our baseline corruption is looking at a room with data coming somewhere shortly after its actual build data.

The big problem here is that every room begins with a 2 byte header for some basic properties, and, like the overlay objects list, each layer is composed of 3 byte entries and ended with a sentinel $FFFF. Except the high priority upper layer, which is terminated with $FFF0. And actually, door objects are only 2 bytes each. But these are just technical details.

The consequence of these values means the submodule is not necessarily reading existing objects and turning them into ~~pits and lands~~ holes and floors. It also lowers the chances of encountering the sentinel to stop, although we'll find that it pretty much always does in practice. At the end of each bank is unused space filled with $FF bytes, and any room with a single layer will, by definition, have no objects on the lower layer, giving two sentinels in a row. Having four consecutive $FF bytes makes it impossible to continue when reading three bytes at a time, no matter where we started before them. The short and cute proof by exhaustion is left as an exercise to the reader.

Most rooms will thus read inappropriate data from the next room (or even the next next room), but "next room" isn't precise. By "next room", I mean next in terms of data location within the ROM. If you're curious about what exactly this order is, you can look at rooms.asm in my US disassembly; all the data is exactly the same and in the same order. The only difference being that the bank 03 rooms are shifted forward 16 bytes. This doesn't matter, because you're just looking at relative location and size. This is enough to look at say, ice rod cave, and given the size of its object data, overlay corruption starts reading from the start of data for room 0112, a couple of caves in the Dark World.

Faded Encounter

The caveat to the information above is that not every room is boring and unspecial. The most common exception is a faded transition. Stairs, falling, and dark rooms all fade the screen in software by manually manipulating palette data. In doing so, RMOBJPTR is clobbered, as it is reused for a pointer to a table of bitmasks used by the fade algorithm. In the end, it will settle at $E8BC. This routine only uses this variable for a 16-bit pointer, and it never touches RDOBJX. For boring rooms, that means reading starts from $E9DC ($0120 bytes further) in the room's data bank, while rooms with lightable torches will start somewhere earlier.

Faded transitions can thus be split into three broad categories for overlay corruption based on which ROM bank the current room is in. The rooms in bank 1F are the most stable, because this entire bank is just room data, but banks 03 and 0A are very likely to cause serious issues, because they point at…other stuff… In bank 03, $E8BC is in the middle of a precalculated sine table, and $E9DC is part of hobo draw data. In bank 0A, this entire range is dungeon map code.

Bank 03 has four consecutive $FF bytes at $03EBD1, 789 bytes further ahead. In between, there are no consecutive $FF bytes, which means all fade corruptions read this far.

Bank 0A has four consecutive $FF bytes at $0AEE6D, 1457 bytes further, and no sentinels in between, so bank 0A fades will all read to this same point.

Remember these numbers.

Dam it!

Pulling the switch in the dam to release the floodwater will perform a tilemap buffer update that reuses the pointer and index variables; RMOBJPTR gets set to $04EEAD, and RDOBJX finishes with $000F. This is another list of objects, but the end result is that these values are now pointing to a sentinel, so no erroneous object changes are made.

And who can forget the reason this all exists in the first place? Star tiles! What happens when you do a proper overlay followed by a corrupted one? Well, like pulling the dam switch, the pointer and index are reused to draw the overlays (this was kinda already mentioned). After a successful, glitch-less overlay, these variables will be looking at a sentinel from the previous operation.

For both of these, there's still a tiny bit of corruption though! The buffer is still capped off as normal, and the flag to flush it is set.

The Write Stripes

As mentioned earlier, we need a way to write small, arbitrary chunks of data to VRAM. This is handled with a queue located at $1100 and triggered by setting the flag at $18. Each entry in this buffer contains a 4 byte header followed by a payload. The header begins with a 16-bit VRAM address and is followed by an 8-bit write mode and an 8-bit size. If the VRAM address is $FFFF, the transfers stop (sound familiar?).

To get back to the small corruption from the dam and stars for a moment… The sentinel check only occurs at the end of the buffer loop. If the first VRAM address is $FFFF (which it will be, given that 0 objects were drawn), then it will be taken literally. The value translates to location $FFFE in VRAM (since it indexes 16-bit words). This slightly corrupts the last (unused in practice) character in BG3 then overflows back to the background 2 map, where it draws one tile each to the middle of the top three rows. The rest of the transfer should be valid from the previous operation and be unnoticeable.

That's the best case scenario for corruption.

Remember: even though the tilemap buffer was never changed, a corrupted entry to the overlay submodule still uses the list of objects to create a list of transfers. One slice created by the overlay stamper is 4 vertical tiles; at 2 bytes per tile with a 4 byte header, that's 12 byte per slice, or 48 bytes per object. As a ratio, that's 16 bytes written for every byte read.

The space allotted to this buffer (which is also memory shared by another buffer) is 2176 bytes. At 48 bytes for each 3 byte object, it only takes 46 objects before overflowing into memory allocated for other stuff.

The first block of memory following the stripes buffer is all variables related to doors. Their type, direction, and position, as well as the position of special exit modifiers are all here. A lot of door behavior is handled by their tile type in the collision buffer, but actions that open a door—puzzle shutters, key doors, bomb/bonk walls, vines/curtains—rely on these page $19 variables for operation. If these variables get corrupted, these actions just don't work. An easy example of this to see is with the kill room shutters in the first room of Castle Tower.

Following that is mostly stuff which gets reinitialized before it's needed. If you have a follower, it will walk funny for 20 steps. You can corrupt your message pointer and cause the game to crash with a subsequent blue YBA. But, the only thing potentially useful is with the mirror portal, which will be discussed later.

There's just nothing of note here.

That is, until we reach $2000.

All of these writes occur with absolute addressing in data bank 00. What that means is that the code is actually operating on the WRAM mirror. This mirror only occupies the first 32 pages, or 8 KB in any bank with the mirror. Following it, on page $20 (a page is 256 bytes), is 256 bytes of open bus. After that is where I/O for the PPU lies. This is where things break bad.

Kaleidoscope

It takes 81 objects to reach $2000 (open bus) and an additional 5 (for 86) to reach $2100. Everything related to control of the display is here along with a couple ports for music and WRAM. Some of these registers—such as background mode, background scrolling, color math—are rewritten every frame from queues in WRAM, but others are not. Notably, the registers that tell the PPU where to look for background character data, background tilemaps, and sprite characters are not done every frame. The latter two groups are rewritten at various points as a one-off, but the background character pointers are only written once ever, on boot.

Corrupting these registers is what causes extremely broken graphics. Strictly speaking, this is not VRAM corruption. There are tiny bits of corruption here and there from other sources, but those eventually get fixed, whereas these pointers require a console reset to return to normal. In and of itself, VRAM is not corrupted. It's sort of like reading a book upside-down. Everything in the book is perfectly fine and sensible; you're just looking at things wrong. In the same vein, these registers have changed how the PPU looks at video memory, and it has to take everything it sees at face value.

Danger

The next 8 kilobytes are just open bus. They're literally not connected to anything, so writing to these addresses does nothing. They can be connected, but, for this game, they're not. Three controller registers sit in a sea of open bus on page $41.

Page $42 is where death occurs.

It takes 262 objects to reach $4200 where the CPU registers live. That one object is enough to destroy all of these registers simultaneously, with the next object returning to open bus writes. The biggest problem is the first register: NMITIMEN at $4200. This register enables the 60 Hz NMI signal that lets the CPU sync code with video refresh, controls what if any triggers the IRQ raster interrupts use, and enables automatic controller polling.

Page $43 holds all of the DMA/HDMA property registers, but by now, it's already over. You're dead. It's just kicking you while you're down. The next 15,489 bytes are open bus, followed by 32 KB of ROM. In practice, it doesn't seem possible to get this far, but even if it is, those writes would do nothing (that's the "RO" in "ROM").

Brutal Murder

As an example of some really bad behavior, let's look at overlay corruption in Uncle Passage. This one goes really far—far enough to hit these CPU registers.

The first write is a $15 to NMITIMEN, which disables NMI, sets IRQ to use the horizontal trigger only, and enables automatic controller polling (sweet!). Said horizontal trigger—$4207 and $4208—is written with the values $3C and $22, setting the trigger point to scanline position 60.

An $08 sent to $420B runs DMA channel 4. This channel hasn't done anything since the previous frame, so it's guaranteed to have a size of $0000, which translates to a 65,536 byte write. This channel has only been used for VRAM writes. So this just completely obliterated video memory.

A $15 is written to $420C, the HDMA enable register. Channels 0, 2, and 4 are now running HDMA with garbage values. None of these channels are expected to handle HDMA in this game, only channels 6 and 7. Subsequent writes on page $43 will modify what exactly these HDMA channel are doing.

After all that, the game is still running fine. For now. It hasn't actually crashed; the CPU is still executing perfectly legal code, and it even manages to complete the frame and return to the beginning of the game loop where it waits for the next frame. Forever.

The CPU is stalled at the end of a frame of game play, waiting for an interrupt that will never occur. Instead, a different interrupt is firing off 262 times every frame—over 15,000 times per second. The IRQ routine eats up about 25% of every scanline just to do nothing but exit. Half the screen is black, because the HDMA on channel 4 is disabling it; only by pure coincidence does it make the screen visible again at the top of the frame. Channels 2 and 4 are writing to the beam position registers. As those are read only, these channels are doing absolutely nothing of effect.

But, hey, the game hasn't technically crashed! It's just hardlocked. A really really hard lock.

But wait! There's more!

This is still only half of the story. Remember the reason these writes even occur: to populate a buffer with VRAM transfers. These transfers are built from the tile map that wasn't updated like it should have been. Data is copied verbatim from this tilemap buffer; it's not changing anything; it's exactly what's already there. With an oxymoronically clean corruption, nothing will look different.

That again changes once open bus is reached. It's a lot easier to read far than it is to write far, because writing is stopped by a sentinel in the ROM, and there's often one within reach. As soon as reading reaches open bus, well…

Open bus will initially just throw out endless $11 bytes; this comes from the last byte of the operand that does the read. Page $21 starts acting weird. Open bus here defers to the value on the PPU's bus, which gives it the ability to return a lot of different values and can even depend on which song or sound effect is playing. This makes the exact consequences of any particular corruption hard to predict.

For this to stop, the code needs to eventually read a sentinel value at the start of a chunk. This can't be guaranteed the same way it could for writing the buffer. It could technically happen on page $21, but that's extremely unlikely. The first real opportunity is unitialized memory on page $43, which happen to be $FFFF, but they're small targets. If those are missed, the next opportunities are in ROM, where decently-sized blocks of $FF bytes exist at $0089C2, $0098AB, $00CF46, $00E892, $00F7E1, and $00FFB7. There are also random candidates littered about, but they're also small targets. If none of these are hit, reading will eventually wrap around. This technically allows infinitely many opportunities to hit each potential sentry, and, statistically speaking, things should stop eventually.

Everything here save the few hardware registers is going to be an onslaught of $11s. That's over 24,000 bytes. A big transfer from random PPU open bus garbage may skip over a lot, but we're generally looking at hundreds, if not thousands, of 17-byte transfers to the same locations in VRAM.

All this while, these garbage transfers are corrupting VRAM—actually corrupting, not just flipping a book upside-down. That's all that's being corrupted, though. While it can get pretty ugly (or beautiful, if that's your jam), it's not anywhere near as dangerous as the out-of-bounds writes setting up the transfers.

Machine broke

Two additional caveats exist for how dangerous overlay corruption is.

Early revisions of the Super Famicom had a hardware bug that would cause the console itself to crash when HDMA and DMA conflicted near the ends. Far-reaching corruption has a lot of potential to set this off, of particular note is the first room in the back of Skull Woods, which is an otherwise safe corruption that crashes on certain machines.

Despite claims of accuracy that exceeds software emulators (false), the SuperNT (which is, in fact, an emulator) can't seem to handle open bus writes and will just crash fairly often. I don't know what's happening; perhaps it uses this space for its own interfaces. I don't really care. I just know that real consoles and modern software emulators don't crash under such circumstances.

Similar crashes can occur with certain FXPak features, but those can be disabled easily for a safe playing experience.

Fake mirror

The most interesting thing to look at in the corrupted game variable space is the mirror's coordinates. The fun thing about this portal is that it always exists in the Light World, even without the mirror. There's no flag to disable the mirror portal, just a special case for coordinate {0,0}. But even there the sprite exists and is functional. Turning on out-of-bounds mode and walking into the northwest corner of Lost Woods will trigger a warp.

The behavior of overlay buffer writes is always the same, which helps narrow down where the coordinates come from:

The low byte of the X-coordinate at $1ABF will come from the high byte of a tile in the tile map buffer.
The high byte of the X-coordinate at $1ACF will always be $08 or $09.
The low byte of the Y-coordinate at $1ADF will come from the high byte of a tile in the tile map buffer.
The high byte of the Y-coordinate at $1AEF will come from the high byte of a tile in the tile map buffer.

The consistency of the X high byte means that the mirror portal will always end up in the same column of screens as Link's house. The low bytes of each coordinate are less important, only determining the precise position. But they're only less important because of the inconsistent and relatively unpredictable nature of the Y high byte. Any value higher than $0F is off the map. If the value is random (it's not, but assume it is for this point), the portal is only in bounds 1/16 of the time.

There's a restriction on three of the position variables due to palette allocation: at least one of bits 3 and 4 must be set. This is because bits 2, 3, and 4 determine the palette of a tile, and the space for palettes 0 and 1 are used by the HUD. Thus, the smallest coordinate that can normally be had is $08. The one exception to this is in rooms with a transparent floor; with the entire object being invisible, it doesn't matter which palette is used, so it just uses 0. Combining that with the upper bits of the character name of those tiles gives an exception to the minimum: $01. But that's it. There's nothing in between. Similarly, these values can never have bit 1 set. Tile names are 10 bits in length, but the game only uses half of that space.

For the Y high byte, this puts the portal's furthest possible position north just inside the Hyrule Castle courtyard. Or, with transparent floors, just below the Tower of Hera. The character name limitation means there will never be values of $0A or $0B (Link's house); or $0E or $0F (east of dam). The only screens that can contain an in-bounds mirror portal are: Tower of Hera ($01), Hyrule Castle ($08 or $09), and south of Link's house ($0C or $0D).

Summary

Overlay corruption occurs as an unintended entry into the holes overlay submodule via various transition corruptions. The location of the holes comes from misaligned room data or unrelated code and data. Per room, these holes are consistent, but the exact effect is also dictated by fading transitions and previous overlay changes within the same room.

Most corrupted WRAM data is uninteresting or reinitialized before it's needed. The only effect of note is moving the coordinates of the mirror portal, but with very limited application. More often than not, the portal is just moved to an unreachable location off the map.

"VRAM corruption" is both real and wrong. Very nasty corruption can occur, but the persistent and unfixable garbling is caused by garbage writes to registers that tell the video chip where to find character data. These pointers are only written when the game boots up, so they stay broken until a hard reset. Most of the actual corruption in video memory is cleaned up during large graphics changes such as transitions.

Hardlocks that aren't technically crashes occur when CPU control registers are written to with garbage. If corruption reaches that far, it's guaranteed to cause problems.

Some older consoles have problems with crashing in general because of a hardware glitch that occurs when DMA and HDMA are running at the same time. These crashes can occur even in cases where a later console would remain stable.

There are actually 64 APU I/O addresses, but the latter 60 are just mirrors of the first 4.

NTSC reference?!?!?.

Actually, VRAM might be mostly fine. The CPU can only write to it during V- or F-blank, and that isn't necessarily enabled. It probably gets to corrupt stuff during V- blank but not the entire frame.