State Machines: Cooperative Mini-kernels with Yielding

Zachary Booth Simpson
Computer Game Developer's Conference, Austin, TX
16 Nov 1999

(c)2002 ZBS.
Please sign my guestbo0k if you find this work useful.


The Controller Pattern[ZS1] is often implemented as a simple interface with a single abstract method: "doProcess" function. Instances of classed which implement the Process interface are aggregated into a list of processes inside of a Mini-kernel[ZS1] which gives each class an opportunity to execute each game frame by calling the overloaded "doProcess".

Example implementation:

01. class Process {
02.     virtual void doProcess() = 0;
03. };
05. class CycleAnimation : Process {
06.     GameObject animObject;
07.     CycleAnimation( GameObject animObject ) {
08.         this->animObject = animObject;
09.         Kernel::addProcess(this);
10.     }
11.     virtual void doProcess() {
12.         animObject.frame = (animObject.frame + 1) % animObject.numFrames;
13.     }
14. };

When an instance of CycleAnimation is created, it puts itself into the Kernelís list with the Kernel::addProcess (line 9).

Once it is added to the list, the doProcess() function will get called once per game frame until it dies. Notice that, obviously, this is not a thread. If the doProcess() goes into an infinite loop, the game will lock up. That is, doProcess() is a blocking call from the point of view of the mini-kernel. Thatís why the mini-kernel is said to be cooperative.


It is common that a simple process like this becomes complicated enough that it must track the state of the controller. In the previous example, the only state is the animObjectís frame, which is used in a very simplistic way Ė it is modulo-incremented.

Letís take a more complex animation example:

    01. class WalkAnim : Process {    
    02.     int state, startFlag, stopFlag;    
    03.     GameObject animObject;    
    04.     WalkAnim( GameObject animObject ) {    
    05.         this->animObject = animObject;    
    06.         Kernel::addProcess(this);    
    07.         state = STANDING; startFlag = stopFlag = false;    
    08.     }    
    09.     virtual void doProcess() {    
    10.         // Frames: 1-3=starting, 4-10=walking loop, 11-13=stoping    
    11.         switch( state ) {    
    12.             case BEGIN: frame = 1; state = STARTING; break;    
    13.             case STARTING: if(frame++>3) state = WALKING; break;    
    14.             case WALKING:    
    15.                 if(stopFlag && frame==10)     state = STOPPING;    
    16.                 animObject.frame = 4 + (animObject.frame+1)%7;    
    17.                 break;    
    18.             case STOPPING: if(frame++>13) state = STANDING; break;    
    19.             case STANDING:    
    20.                 if(startFlag) {    
    21.                     state = BEGIN; stopFlag = startFlag = false;    
    22.                 }    
    23.                 break;    
    24.         }    
    25.     }    
    26. }

In this example, there is a state variable which tracks which state of a walk animation we are in. There are two semaphores which allow the process to determine if it needs to transition out of or into walk. These flags are needed because of the state-dependent nature of the animation (i.e. the stopping animation assumes that the legs are in a certain position). This is a very typical state machine.

Note that the setting of the start and stop flags is not demonstrated here. The messaging that takes place in order to set these flags is another topic altogether, and is skipped in this analysis.

This method of implementation has several disadvantages:

The last point is one of the most important and least obvious. Suppose that the walk loop state can be initiated in more than one way. In the previous example, it can only be started via the BEGIN state. But, what if there was a RUNNING_JUMP state. For example, suppose the RUNNING_JUMP is the equivalent of a WALKING followed by a JUMP and then back to WALKING. Without a stack, one has to duplicate the functionality of the WALKING state which knows to return to the POST_WALK__JUMP state. An alternative is to add a hacky "in running jump" flag which is checked at the end of WALKING. As you can see, without a stack, things start getting really messy very fast. Iíve seen so many examples of this kind of thing in real game code, there must be a better way!

    01. switch(state) {
    02.     case WALKING:
    03.         if( frame == 10 ) {
    04.             if(jumpingFlag) state = POST_WALK_JUMP;        // YUKKY hack!
    05.             else if(stopFlag) state = STOPPING;
    06.         }
    07.         animObject.frame = 4 + (animObject.frame+1)%7;
    08.         break;

A Solution Which Sucks: Threads

One solution to the kind of pain just demonstrated is to use threads. Threads give us exactly the functionality we want: a stack and a context. However, they suck for games for a variety of reasons.

An example implementation in a thread:

    01. void walk(GameObject animObject ) {
    02.     while( !stopFlag ) {
    03.         for( int i=4; i<=10; i++ ) {
    04.             animObject.frame = i;
    05.             yield();
    06.         }
    07.     }
    08. }
    10. void doWalkAnim( GameObject animObject ) {
    11.     while( 1 ) {
    12.         if( !startFlag ) continue;
    13.         startFlag = stopFlag = false;
    14.         // Transition to walking...
    15.         for( int i=1; i<=3; i++ ) {
    16.             animObject.frame = i;
    17.             yield();
    18.         }
    19.         walk();
    20.     }
    21. }

This is much easier to read and write. It looks like "normal" code. You can use for loops to implement the things which are naturally implemented with for loops, you can use whiles in the same way. Note the use of yield on lines 8 and 14. This tells the kernel that the work is done now and that this thread is willing to give up remaining time.

Also, note that we can use function calls as demonstrated in line 19. This makes encapsulation and functionalization possible, where it is not when using a state machine with a switch statement without a stack.

Again, without going into details, the problem with this approach is synchronization and mutexes. While the demonstrated code is simpler, it is overridden by the complexities and overhead of solving the synchronization problem.

A Better Solution Ė Simulated Threads

What we want is a solution which allows code to be written linearly as demonstrated in a thread, but without the synchronization problems, i.e. without using threads. What do we need in order to do this?

Returning to the sample Controller code.

    01. virtual void doProcess() {
    02.     ...
    03.     if( animObject.frame > 3 ) state = WALKING;
    04.     animObject.frame ++;
    05.     return;
    06. }

The gratuitous return on line 5 is the same thing as a yield Ė it returns control back to the mini-kernel. So, yielding is easy, it is just a return.

Stacks are also easy. Obviously, we can just create push and pop methods in the Process class and inherit them into all processes.

That just leaves storing the ip so we can get back to where we were. It is because we canít store the ip that we created a "state" variable in the first examples. That state variable is like our ip, it tells the process where to go when it comes back into focus. Why canít we just store the ip instead of the state? There is no reason, other than it isnít portable! This is a perfectly legitimate solution if you are willing to forgo portability. It might look something like this (Iím simplifying this greatly):

    01. class WalkAnim : Process {
    02.     WalkAnim() { push(0); }
    03.     virtual void doProcess() {
    04.         int jumpToEIP = pop();
    05.         if(jumpToEIP) {
    06.             jumpToEIP += 3;
    07.             asm jmp [jumpToEIP];
    08.         }
    09.         for( int i=1; i<3; i++ ) {
    10.             animObject.frame = i;
    11.             int temp;
    12.             asm mov [temp], eip
    13.             push(temp);
    14.             return;
    15.             // TADA! A yield!
    16.         }
    17.     ...

Lines 11-14 implement a yield. It stores the instruction pointer on the local stack and returns. (Actually, these exact x86 instructions are illegal, you have to use a stranger syntax involving a stub call, but Iím skipping all that). When the context returns to this process (i.e. the next game frame when the mini-kernel calls this processís doProcess method) the pop on line 4 will see a non-zero jump address and will jump back to where the yieldís instruction pointer was. Of course, we donít want to go there exactly, if we did that we would immediately return again. We want to skip the return on line 14 so we have to add the number of bytes it takes to encode the return in x86 which weíll suppose is 3. Thatís why thereís an add of "3" on line 6.

So, this does work, a few macros like YIELD and INITIALIZE_PROCESS would clean it up a lot. For example:

    01. virtual void doProcess() {
    03.     for( int i=1; i<3; i++ ) {
    04.         animObject.frame = i;
    05.         YIELD;
    06.     }
    07.     ...

But, there is another big problem with this sample. Line 3 in this example declares a local variable "int i". This wonít work! The first time through, i will get set with "1". When the yield occurs, the stack frame which contains the value of i will be lost. When the context returns next game frame, that stack is long gone, the value of i is now garbage! Thus, you CAN NOT use local variables to store state that you expect to be persistent across a yield call! But, this is not a problem Ė simply move the declaration of i intto the class. That is, make is a class data member instead of a local variable.

    01. class WalkAnim : Process {
    02.     int i;
    03.     WalkAnim() { push(0); }
    04.     virtual void doProcess() {
    05.         INITIALIZE_PROCESS;
    06.         for(i=1; i<3; i++ ) {
    07.             animObject.frame = i;
    08.             YIELD;
    09.         }
    10.         ...

There, now it works. Only problem is that it isnít portable due to the Intel assembly used to store the instruction pointer. If only there was some way to make this portable.

Making it Portable

There are two ANSI C functions which lie in the ample heaps of C standard library obscurity: setjmp and longjmp. They work like a fork; the return value of a setjmp tells you if you have returned here because you just set the jump point (like a parent in a fork) or because you have jumped back to this point (like a child fork). For example, here is an interesting implementation of an infinite loop:

    01.     jmp_buf mark;
    02.     if( setjmp( mark ) == 0 ) {
    03.         printf( "Set the jump point.\n" );
    04.     }
    05.     else {
    06.         printf( "Back here again\n" );
    07.     }
    08.     longjmp( mark, 1 );

The output of this program will be one line of "set the jump point" followed by an infinite repeat of "back here again".

So, obviously all we have to do is change our INITIALIZE_PROCESS and YILED macros like this:

    #define INITIALIZE_PROCESS if( pop() ) longjmp( mark, 1 );
    #define YILED if( setjmp( mark ) ) { push(1); return; }

However, we do need to add a "jmp_buf mark" to the Process data variables. There is a catch here: the jmp_buf structure may be large in some implementations of these calls. So, if space is a problem, I suggest using the non-portable solution (i.e. re-implement these macros on each platform Ė not that big of a deal).

Function Calls

One last thing, I promised that we would be able to do function calls with this system like we did with a thread. Implementing this is a little tricky. We need to store the address of the state function and have the kernel call it. We will need a "CALL" macro which pushes the current state function on the local stack and changes the pointer to the new address. We will also need to deal with function arguments which is going to be tricky due to the fact that arguments are within the stack frame and, as weíve already pointed out, we canít expect stack variables to be persistent across a YIELD so we canít expect them to be persistent across a CALL either. Unfortunately, C++ syntax makes this particularly difficult since "member pointers" are type checked very stringently in most compilers. It would take another paper to describe how to get around this, so I simply refer you to my implementation of this. [ZS2]

Giant Disclaimer

This implementation suffers from a major problem: compiler optimizations. The jump at the beginning of the function may skip over the initialization of certain variables, especially those that are aliased into registers. This implies that code in such state machines can never be optimized which is a major drawback, although in many cases such state machines are not speed critical. This implementation has been tested only under Win32 MSDEV C++ using DEBUG (non-optimizing) code generation.


[ZS1] Game Patterns.

[ZS2] Mini-kernel Sample Implementation.