Copyovering on SIG{SEGV,BUS,..}

From: Erwin S. Andreasen (erwin@PIP.DKNET.DK)
Date: 04/30/98


On Wed, 29 Apr 1998, George wrote:


>
> >ObCircle: Has anyone done anything like catching SIGSEGVs, writing a
> >copyover file (assuring certain integrity first), and rebooting that
> >way?  No lost connections.  This is a simplified version of using
> >mprotect to ensure safe saves when segment violations happen, and
> >would be a fairly easy first-step.
>
> Erwin was already trying to do something like that with shared memory
> segments so you wouldn't even have to re-load the world into memory.  I

It was actually ugh, someone whose name I cannot remember - he is creating
a MUD named Eternity - that took it to an extreme - nearly everything was
in shared memory, so a reboot was just reloading a X mb executable - one
second. Not noticable at all.

I do use shared memory, but on a minor scale: I've so far moved help
files, notes and virtual text file system entries to shared memory, which
makes up about 3mb out of 30. It's dangerous, since f*** up some stuff in
the shared memory data can lead to endless  crash loops :) I've thus only
done it with data I know are created by theorethically bugless code.

> don't know how far along he progressed.  The easiest method would be to
> make a SIGSEGV handler just call copyover.  Extra things probably
> necessary.

Yeah, as someone else said, there are a few changes necesssary. 1 - make
copyover not require any Character.

2 - There are only a few functions that are safe in a signal handler. It's
OK to exec, but fopen etc. may be dangerous - say the MUD crashes while in
malloc, and the malloc state is inconsistant. You then call fopen which
calls malloc and.. oops. To be safe, yo ucould rewrite the code to use
pure system calls, write/open/close etc.

There's also the possibility of the linked list of descriptors being say,
crosslinked - for that you should probably keep a counter after writing
each entry and abort if you find you suddenly have 40000 players.. :)

As someone else said - it might also be interesting to write the
copyover.dat file EACH time the player state changes, when you know you
are safe.

Oh, you must also remember to remove the signal handler for the signal you
catch, if the signal handler is not one shot.

An alternative way of doing this is to have a 'Watcher' binary which
starts up, then forks to run the MUD. Using an UNIX socket, the file
descriptors are exchanged between the two processes. when the MUD goes
down, the Watcher still has the sockets. It restarts the MUD and passes it
the sockets back.

This could be useful if you need to shutdown the MUD for say, 5 minutes -
the Watch could then be a mini-MUD in itself, allow for say,
communication.

One last thing: cores. You'll have to do a fork (and return in the forked
copy - that usually leaves a nice clean core) to get a proper core file.


 =============================================================================
Erwin Andreasen   Herlev, Denmark <erwin@pip.dknet.dk>  UNIX System Programmer
<URL:http://www.abandoned.org/drylock/>     <*>         (not speaking for) DDE
 =============================================================================


     +------------------------------------------------------------+
     | Ensure that you have read the CircleMUD Mailing List FAQ:  |
     | http://democracy.queensu.ca/~fletcher/Circle/list-faq.html |
     +------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 12/15/00 PST