After the popularity of the earlier articles, we’re happy to share another devlog by Rapha, diving deep into a very obscure bug you never saw in the live game, and why it matters!
We hope you enjoy reading it!
Background: We had a phantom message appearing on chat in the testing of a specific update:
> ” turned from player to spectator”
Who was this unnamed person? Where did this come from?
But to make sense of the problem, first we need to dig deep and start from the basics!
*This mostly doesn’t talk about a specific computer architecture, unless specified*
*Code examples are written in a Pascal-esque way for easier understanding*
The ideal computer runs instructions in series
CODE

That’s good and all, but every single program cannot be linear, memory isn’t infinite nor problems can be unwrapped completely at compile time to make it possible. Sometimes you need to go back and redo a calculation. That’s why programs JUMP *ぴょい~ん, JUMP, カンガルーのように!*
So things like, increasing a number can be done by:
CODE

DATA

But what if there’s something after that JUMP 0x0000, that needs to be executed afterwards? What if we could jump to somewhere else and know where to go back? That’s when the stack is useful.
The stack is a structure of data that is normally filled bottom to top (imagine a stack of dishes). Let’s imagine it here, the bottom is, let’s suppose, the last address we can use, in this example 0xFFFF.

One of it’s important functions is keeping track of where you were before. Using more complex “JUMP” instructions that store the next instruction address to them into the stack, and points where the stack ends using the special register known as “stack pointer” we can know where to go back afterwards. Let’s call it “CALL”. And let’s say it currently points to 0xFFFF in our stack above.

As soon as that call executes it would add 0x1008 to our stack, and change the stack pointer register accordingly to 0xFFFB.

That way, once that function ends and executes, let’s call it, a RET[TURN] instruction, it would fetch that value from the stack 0x1008, move execution there and update the stack pointer again, so it goes again to 0xFFFF.
That’s how computers the keep execution flow!
But the stack also have another function, it stores on most architectures… local variables!
> Their counterpart would be global variables, that are stored
> inside the executable memory space or in the heap, a memory space that
> you ask the OS for, and if not properly released when not used, ends
> up as that thing people love to call “memory leak”
Imagine you have a function that’s something like:
function add_two_numbers(a: Integer, b: Integer):Integer;
var
c: Integer;
begin
c := a + b;
Result := c;
end;
That variable c, for convenience will be stored on the stack, that’s why whenever you call that function, it will have a wrapper code, that I won’t explain, but basically expands that stack pointer up, so you can use the space below it. Something like:

As soon as the function exits, it will simply change that modification to the stack pointer, before getting the address to return to:

Observe how c is still there, because is most implementations of languages, it’s too expensive go around zeroing that memory, so the value just lingers.
Now imagine that you have a similar function, that blindly believes that value is zero, and it’s called right afterwards:
procedure prints_value(a: Integer);
var
b: Integer;
begin
b := b + a;
WriteLn(‘The result is ‘+b);
end;
You see were this is going right? It’s value will need a place on the stack:

But that memory still contains the result of the previous operation, so when the operation b + a is executed, its not doing 0 + a, but whatever was there + a, this incurs an undefined behavior, we cannot know what is there, so it could be a totally different value, and our function will never properly execute.
You never want that, unless you are doing something malicious!
The Bug
But why such a long post to explain this concept? Well, during the internal testing of a new build we had a bug.
A message was printing saying:
> ” turned from player to spectator”
without a name.
It only started happening when we change some of our logging code, why???
Well, there’s a function that sends to the player the current information the host knows about the players and spectators it started with;
procedure sends_stuff_to_client;
var
the_data_to_send: SOME_BIG_STRUCTURE;
begin
end;
That the_data_to_send variable was allocated at the stack, and was never properly fully initialized. For *ab immemorabili*, time immemorial, the bits we weren’t initializing afterwards were inheriting the memory of whatever was on the stack before. It always *seemed to work*, until we changed how our log messages were formatted for security reasons.
Now a message to be logged that contained both text and numbers, was formatted more or less like this:
game_log_message(format_message(“This happened with this {} value”, value));
The return of format_message isn’t magically stored away if its big, so the compiler was just adding a new value to the stack, an unnamed one. In a function call that executed before sends_stuff_to_client this made the stack be aligned slightly different, but different enough so when the_data_to_send was allocated, a specific bit of it ended up aligned with data lingering from previous function calls. The data was specifically the SteamID of the host, and it was ending up exactly at the spot on the_data_to_send that contained the SteamID‘s of spectators.
So that stuff was sent to clients, the clients would interpret it and see *”well, this person, with an ID pertaining to a player, was changed to a Spectator”*, then the code to log that was checking *”what is his name really?”* and found nothing, because there was no name associated with that spectator entry. So we got:
*” turned from player to spectator”*
It didn’t affect the game in any way, no state was being changed based on that, but every time that packet was to be sent to the client, it had a specific miss-initialization that caused that message to be displayed.
The coolest thing in this history (but also the most scary one, since it made us run around in circles until the cause was fount out), it only happened on Windows. Only the Microsoft C++ Compiler was generating this specific stack structure and making this happen, our Linux and macOS versions, compiled by GCC and Clang respectively, just didn’t display the bug, because the memory was still uninitialized, but the format didn’t exactly match so that specific SteamID ended up aligned where we were expecting a SteamID.
The fix was easy, and should be there from the start:
FillChar(the_data_to_send, SizeOf(the_data_to_send), 0);
This ensured the thing was always zero, nothing was lingering and so the bug never happened again.
Conclusion
In our case this just caused a message to appear, but it could be a security problem, important data could leak via a similar bug, that’s why an important security feature to be implemented in some compilers is zeroing the stack on function exit, so doesn’t matter what was there, it will turn into zero, so no one will be able to see it by mistake (or on purpose) afterwards.
I know this wasn’t the easiest post to follow around, and that people normally aren’t interested in such specific stuff, but hopefully you liked it!



































