Re: Copyovering on SIG{SEGV,BUS,..}

From: James Turner (turnerjh@XTN.NET)
Date: 04/30/98


"Erwin S. Andreasen" <erwin@pip.dknet.dk> writes:

> I do use shared memory, but on a minor scale: I've so far moved help
> files, notes and virtual text file system entries to shared memory, which
> makes up about 3mb out of 30. It's dangerous, since f*** up some stuff in
> the shared memory data can lead to endless  crash loops :) I've thus only
> done it with data I know are created by theorethically bugless code.

I like this idea... I think I'll look into implementing it on my
codebase.  Would it be possible to make a dropfile, ".badshared" that,
if present, the code won't re-use the shared memory?  Always make this
file as one of the first things, then remove it if the shared memory
data passes some kind of validity check.

> 2 - There are only a few functions that are safe in a signal handler. It's
> OK to exec, but fopen etc. may be dangerous - say the MUD crashes while in
> malloc, and the malloc state is inconsistant. You then call fopen which
> calls malloc and.. oops. To be safe, yo ucould rewrite the code to use
> pure system calls, write/open/close etc.

Do you have a reference to program state validity that contains this
kind of information?  I'm assuming Advanced Programming in the Unix
Environment would, but unfortunately my copy got purloined long ago.
Perhaps an online definitive reference (preferably something on the
POSIX level instead of OS level).

> As someone else said - it might also be interesting to write the
> copyover.dat file EACH time the player state changes, when you know you
> are safe.

I'm doing this right now.  It is working fairly well so far, but I
only did it about two hours ago, so the jury is still out.  To get it
to work, I made a changeState function, then used emacs to find all
writes using either STATE or d->connected and send them through the
wrapper function. The wrapper sets a flag (dirty_state) if any change
takes a player from or to the CON_PLAYING state.  At the end of
game_loop, the copyover file is written if the dirty_state is set.
The more players online, the more often (and the larger) this file
will be.  I don't think it will have any noticeable impact on resource
usage, but I can't be certain.

> Oh, you must also remember to remove the signal handler for the signal you
> catch, if the signal handler is not one shot.

POSIX systems should do this automatically.  BSD signals don't, I believe.

> An alternative way of doing this is to have a 'Watcher' binary which
> starts up, then forks to run the MUD. Using an UNIX socket, the file
> descriptors are exchanged between the two processes. when the MUD goes
> down, the Watcher still has the sockets. It restarts the MUD and passes it
> the sockets back.

Perhaps instead of forking (ie needing all that extra memory for
code... though that should be shareable), create a new process and use
named pipes for info swapping.  Maybe even have all connections go
through the watcher, passing descriptors to the mud when they are
connected.  This might be a first step in making circle threaded
(something I still disagree with).

> This could be useful if you need to shutdown the MUD for say, 5 minutes -
> the Watch could then be a mini-MUD in itself, allow for say,
> communication.

Or operate as a simple sign.c (heh or even a complicated one).

> One last thing: cores. You'll have to do a fork (and return in the forked
> copy - that usually leaves a nice clean core) to get a proper core file.

I'm calling abort() in mine at the moment, and it makes clean cores as
well.

Something I did on a previous mud I worked on, and will be doing soon
on my current code, will be adding crash tracking.  This will be
particularly nice for systems when writing 20 meg cores isn't really
an option.  Here's what it could do:

#define CHECKPOINT(s) \
  snprintf(state_buf, MAX_STRING_LENGTH, "%s: %s (%s)", \
           __FILE__, __LINE__, s)

#define COMMANDPOINT(ch, s)
  snprintf(last_command, MAX_STRING_LENGTH, "%s, %s", \
           ch ? GET_NAME(ch) : "INVALID CHAR", s)

Then you could call these routinely through the code to update the
current checkpoint.  It does help in some cases, though not in all --
mainly in cases where the stack gets overwritten.  Maybe even use
mprotect to protect the buffers from writing so that they're ensured
to be safe when a segv happens.

--
James Turner               turnerjh@xtn.net
                           http://www.vuse.vanderbilt.edu/~turnerjh/


     +------------------------------------------------------------+
     | Ensure that you have read the CircleMUD Mailing List FAQ:  |
     | http://democracy.queensu.ca/~fletcher/Circle/list-faq.html |
     +------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 12/15/00 PST