Re: Copyovering on SIG{SEGV,BUS,..}

From: Erwin S. Andreasen (erwin@PIP.DKNET.DK)
Date: 04/30/98


On 30 Apr 1998, James Turner wrote:

> "Erwin S. Andreasen" <erwin@pip.dknet.dk> writes:
>
> > I do use shared memory, but on a minor scale: I've so far moved help
> > files, notes and virtual text file system entries to shared memory, which
> > makes up about 3mb out of 30. It's dangerous, since f*** up some stuff in
> > the shared memory data can lead to endless  crash loops :) I've thus only
> > done it with data I know are created by theorethically bugless code.
>
> I like this idea... I think I'll look into implementing it on my
> codebase.  Would it be possible to make a dropfile, ".badshared" that,
> if present, the code won't re-use the shared memory?  Always make this
> file as one of the first things, then remove it if the shared memory
> data passes some kind of validity check.

Well, how to detect when is the shared memory not valid? Set a dirty flag
each time before modifying it and clear it afterwards?

What I do BTW, is to have functions to acquire a "named" data area in the
shared memory and use that as the list head for something like notes etc.

So the code for loading something like the help_list looks like this:

    // See if we already help help files in memory
    if ( ( help_first = (Help**)shared->getData(SharedManager::Help)))
    {
        logf ("Restoring helps from shared memory");
        got_helps = true;
    }
    else
    {
        // What we allocate is a Help*
        // What is returned is a ptr to the block of memory that has Help*
        help_first = (Help**)shared->allocateData(SharedManager::Help, sizeof(Help*));
        got_helps = false;
    }

And then, *help_first is where the head of the help_list is stored.

> > 2 - There are only a few functions that are safe in a signal handler. It's
> > OK to exec, but fopen etc. may be dangerous - say the MUD crashes while in
> > malloc, and the malloc state is inconsistant. You then call fopen which
> > calls malloc and.. oops. To be safe, yo ucould rewrite the code to use
> > pure system calls, write/open/close etc.
>
> Do you have a reference to program state validity that contains this
> kind of information?  I'm assuming Advanced Programming in the Unix
> Environment would, but unfortunately my copy got purloined long ago.
> Perhaps an online definitive reference (preferably something on the
> POSIX level instead of OS level).

Sorry, I don't know of an online reference. Stevens APUE is indeed where I
saw this. Maybe you could look at the Open Group's page
(www.opengroup.org) - they have complete man pages that you can comment on
of the new Unix reference, maybe they have a list. At least I think it was
TOG, I got some letter about an online Unix Reference from them a while
ago.

The table in APUE contains some 84 system calls and some library
functions, so I won't type it in here. It also mentions that the reentrant
functions are mentioned in the SVR4 Interface Definition books (If you
don't have, I'm sure my company, DDE, can sell you a set, I think we have
about 10 5-book sets noone uses that we found when cleaing the offices :)

I think Sun man pages are also a good source :

$ man execl
...


ATTRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:

     ______________________________________________________________
    | ATTRIBUTE TYPE|                ATTRIBUTE VALUE              |
    |______________________________________________________________
    | MT-Level      |  execle() and execve() are Async-Signal-Safe|
    |_______________|_____________________________________________|

but man fopen:

     __________________________________
    | ATTRIBUTE TYPE|  ATTRIBUTE VALUE|
    |__________________________________
    | MT-Level      |  MT-Safe        |
    |_______________|_________________|

Actually, the man attributes(5) page also contains a complete list of the
async-signal-safe function, so you can look at that if you have a Sun
machine near you (what you are using now seems Linuxish :)


> > As someone else said - it might also be interesting to write the
> > copyover.dat file EACH time the player state changes, when you know you
> > are safe.
>
> I'm doing this right now.  It is working fairly well so far, but I
[...]

Alternative, you could stuff that info into a small block of shared
memory, avoiding disk writes. The could should be simple enough so that is
safe.

> > An alternative way of doing this is to have a 'Watcher' binary which
> > starts up, then forks to run the MUD. Using an UNIX socket, the file
> > descriptors are exchanged between the two processes. when the MUD goes
> > down, the Watcher still has the sockets. It restarts the MUD and passes it
> > the sockets back.
>
> Perhaps instead of forking (ie needing all that extra memory for
> code... though that should be shareable), create a new process and use
> named pipes for info swapping.  Maybe even have all connections go
> through the watcher, passing descriptors to the mud when they are
> connected.  This might be a first step in making circle threaded
> (something I still disagree with).

The Watcher would be a separate, small executable, no forking of the main
process would be done, the Watcher would fork to exec the MUD instead -
the file descriptors are just sent using arcane msgsomething calls over
the socket

> > One last thing: cores. You'll have to do a fork (and return in the forked
> > copy - that usually leaves a nice clean core) to get a proper core file.
>
> I'm calling abort() in mine at the moment, and it makes clean cores as
> well.

Strangely, abort worked poorly for me - perhaps because I called it from a
signal handler.

> Something I did on a previous mud I worked on, and will be doing soon
> on my current code, will be adding crash tracking.  This will be
> particularly nice for systems when writing 20 meg cores isn't really
> an option.  Here's what it could do:
>
> #define CHECKPOINT(s) \
>   snprintf(state_buf, MAX_STRING_LENGTH, "%s: %s (%s)", \
>            __FILE__, __LINE__, s)
>
> #define COMMANDPOINT(ch, s)
>   snprintf(last_command, MAX_STRING_LENGTH, "%s, %s", \
>            ch ? GET_NAME(ch) : "INVALID CHAR", s)
>
> Then you could call these routinely through the code to update the
> current checkpoint.  It does help in some cases, though not in all --
> mainly in cases where the stack gets overwritten.  Maybe even use
> mprotect to protect the buffers from writing so that they're ensured
> to be safe when a segv happens.

Yeah, I do this as well
(http://www.abandoned.org/drylock/ftp/short-2.tar.gz)

Also a checkpoint, that prints current function name, __FUNCTION__ into a
buffer. I used to have some sprintf overflow some obscure place, which I
finally found by turning sprintf into a macro that save the format string
and file/line onto that buffer before doing the real sprintf. that slows
the MUD down quite a bit, but found the bug.


PS: Anyone know what Descriptor::._53::~._53(void) is supposed to be? nm
will not demangle it, and I have an undefined reference to it, and I can't
locate the demangling reference for gcc - the unmangled name is
_._Q210Descriptor4._53.



 =============================================================================
Erwin Andreasen   Herlev, Denmark <erwin@pip.dknet.dk>  UNIX System Programmer
<URL:http://www.abandoned.org/drylock/>     <*>         (not speaking for) DDE
 =============================================================================


     +------------------------------------------------------------+
     | Ensure that you have read the CircleMUD Mailing List FAQ:  |
     | http://democracy.queensu.ca/~fletcher/Circle/list-faq.html |
     +------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 12/15/00 PST