Re: Crash Recovery Techniques

From: Daniel A. Koepke (dkoepke@california.com)
Date: 03/01/00


On Wed, 1 Mar 2000, Pat OLaughlin wrote:

>         On my MUD, I implemented a very simple system that will use
> signals to detect when a segmentation fault occurs and directly
> afterwards will automatically "copyover" the MUD.  It also tells the
> last command that was typed and the person who typed it.  This
> system works great but it's not as extensive as I want it.

This comes up every few months, and every time it comes up I say the same
thing: this is not a good idea.  Corrupt data (buffer overflows, etc.)
cause segmentation faults.  After a seg fault, the state of the memory
associated with a process cannot be relied upon to be accurate, nor is it
entirely clear how each individual operating system handles core dumps.
It's very easy, then, to create a situation where the Mud either further
corrupts data (especially data being written to files open for writing
when the crash occurred) or gets into a loop of crashing on corrupt
data.  What you really want to do is have data (world state) persist over
boots of the game engine, and...

Catching a segmentation fault is *NOT* the way to handle this -- the
proper method is to adopt some sort of model of persistence, such as you
would get with MySQL, and store world state separate of the Mud.  A
full-featured persistent database storage system has features for
transaction logging, recovery, and other vital aspects and will be *more*
than suitable for your purposes.

End result?  People would still get disconnected on crash (unless you
moved the networking code and game engine apart, as I've recommended in
times past), but on reconnection the game would appear to be more or less
the same, depending upon when the world state was last updated.

> Another cool feature is the players wouldn't even know the crash
> happened.  The world would be restored to it's original state.

Better to spend your time fixing the crash bugs than making them
transparent, and then approach persistence for better reasons (i.e.,
creating a truly dynamic world, rather than a static one which runs in
cycles based upon zone reset).

-dak


     +------------------------------------------------------------+
     | Ensure that you have read the CircleMUD Mailing List FAQ:  |
     |  http://qsilver.queensu.ca/~fletchra/Circle/list-faq.html  |
     +------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 04/10/01 PDT