MCC Halo 2 'BSP Crash' Fix
2021-06-09 Update: 343 has responded to my support ticket that they are aware of the issue :)
2021-10-13 Update: 343 Fixed the crash in Season 8 (v2580)! Unfortunately this patch also broke a ton of speedrunning tricks, but at least there’s no crashes?
2024-08-23 Update: Fixed some broken links and removed Twitter link, since I deleted my account after Musk purchased it.
Overview
For many months, a game crash has been plaguing the MCC Halo 2 speedrunning community. Seemingly random and inconsistent, it mostly struck players doing IL (Individual Level) speedruns. The crash gained its name from the bright green text that appeared in the bottom right of the screen during the game’s final moments.
Credit: Dubhzo
The oldest clip I could find of this style of crash is from early August 2020, and we believe the crash originated from one of the MCC patches around that period. Curiously, the patch notes for the July 14th 2020 release mention “Halo 2: Resolved a crash that could occur during long playthroughs”. Hmmm.
Digging Deeper
After some hot tips from the HaloRuns community for reproducing the crash and some analysis of crash dumps, I figured out where in memory the relevant data was stored. The memory layout for that section looks something like this:
next_index
starts at 1 because the 0th element in the buffer is used as an offset to add to pointers read from the buffer. On PC this offset seems to always be 0 and is not relevant to the bug itself.
Every time the game loads content, a new value is stored in p_buffer
at the index of the current value of next_index
. next_index
is then incremented by 1 and the value of the previous index is returned. The pseudocode for the function looks like this:
The bug here is that next_index
has no upper bound and only has 1 reset condition: the first time the level is loaded. It grows and grows until it is larger than 32768, pointing beyond the end of p_buffer
and corrupting other regions of memory. First it overruns onto the debug information which causes the familiar green BSP and POS counters to be visible. After that, critical runtime and scenario data is overwritten, at which point the game cannot cope and crashes.
Data corruption bugs are usually some of the most difficult to trace and debug. We were extremely lucky that the debug register for showing the BSP text on screen was the first thing to get corrupted. Being able to find the format string for that text and working backwards from there made the process very smooth. The full format string for that text actually looks like this:
This itself might have been an earlier version of the pan-cam stats that later games used. I really like finding these little debugging tools that developers made for themselves.
Repro
Since we now know the exact cause of the crash, making a repro was easy. Quarantine Zone was by far the worst level for triggering the crash, so the optimal strategy for reproducing it is the following:
- Start up QZ
- Drive to the 2nd shutter door
- Restart the level
- Repeat for roughly 9 minutes until
next_index
grows above 32768 and corrupts memory.
A faster alternative with access to debugging tools is to just set next_index
to a value close to the threshold, say 32000. This should cause the crash quickly depending on the load zones in the level.
Resolution
Typical players will almost never run into this crash. It relies on restarting and running through the level over and over, dozens of times so that the index grows large enough to crash. What kind of player would do such a thing?
Oh right, speedrunners.
If you’re grinding for a good IL time, you will constantly be restarting the level after any mistake. This could be dozens or hundreds of attempts in a single session. That’s why this crash hasn’t been more widely reported by the playerbase, they just don’t play enough!
Until this is officially fixed by 343, I have released a code patch to fix this bug. The basic gist of the solution is to hook the functions that modify the p_buffer
table and redirect them to write into a std::vector
that can dynamically grow to any size. The p_buffer
table is then not used any further. While this doesn’t solve the underlying issue of poor memory management, it does allow for vastly longer play sessions without crashing.
The code for the patch is available here.
Final Thoughts
Big thanks to Harc for helping me with debugging and testing: Check out his Twitch
Did somebody say HaloRuns.com? If you’re interested in speedrunning any game in the Halo series, give the HaloRuns site and Discord a peek.
If anyone from 343 is reading, the relevant functions and addresses used are here:
Game Version | MCC 2282 , Steam |
next_index | halo2.dll+CD8098 |
p_buffer table | halo2.dll+E22370 |
clear_pointer_table | halo2.dll+6DF770 |
get_pointer_by_index | halo2.dll+6DF7A0 |
insert_pointer | halo2.dll+6DF7B0 |
Game Version | MCC 2406 , Steam |
next_index | halo2.dll+CD9098 |
p_buffer table | halo2.dll+E23110 |
clear_pointer_table | halo2.dll+6DF710 |
get_pointer_by_index | halo2.dll+6DF740 |
insert_pointer | halo2.dll+6DF750 |
pls fix
More examples of crashes:
Harc
EggplantHydra:
Temperament:
ibigblue:
Raiyuki:
This one is really interesting because it overflowed just enough to write into the debug data and enable the BSP debug text, but not far enough to actually cause a crash until the next load. The artifacts you see flailing around are caused by other parts of code still reading/writing to the addresses that are now shared with the overflowed p_buffer
.
Bonus
Halo 2 Mobile?