Weird crashes

From: Ron Cole (rcole@ezy.net)
Date: 09/05/02


A recent hard drive crash got us a fresh start on Redhat 7.3.  Our mud is an
older Circle with many modifications over the last oh, 8 years or so.  We
gained 2 new annoying problems with the lastest setup.  Core files now contain
an extention with (I'm guessing) the pid of the program, ie core.30232, etc.
Which means they quickly gobble up disk space if you don't stay on top of
them.  Web searches haven't turned up anything on how to make this go back to
the old behavior, any ideas?

Secondly, we went from 7-10 day uptimes to several crashes a day with no code
changes.  They are all crashes in memory allocation calls which makes no sense
to me, if the memory allocation code was buggy, the whole system would be
unstable.  Here's a sample core.

GNU gdb Red Hat Linux (5.2-2)
This GDB was configured as "i386-redhat-linux"...
Core was generated by `bin/newastro 2447'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2

#0  0x400e3bca in chunk_alloc () from /lib/libc.so.6
#1  0x400e5437 in calloc () from /lib/libc.so.6
#2  0x080a6d53 in write_to_q (
    txt=0xbffff030 "alias wearcloak rem cloak scabbard; rem amulet; wear cloak;
put amulet scabbard", queue=0x9f96700,
    aliased=0) at comm.c:926
#3  0x080a7bdf in process_input (t=0x9f95dd8) at comm.c:1411
#4  0x080a55ba in game_loop (mother_desc=3) at comm.c:563
#5  0x080a4a63 in init_game (port=2447) at comm.c:267
#6  0x080a47f7 in main (argc=2, argv=0xbffffb44) at comm.c:214
#7  0x400871c4 in __libc_start_main () from /lib/libc.so.6

Here's line 926 in comm.c.

926       CREATE(new->text, char, strlen(txt) + 1);

Here's the CREATE macro.

#define CREATE(result, type, number)  do {\
        if (!((result) = (type *) calloc ((number), sizeof(type))))\
                { perror("malloc failure"); abort(); } } while(0)

If we were out of memory (should not be possible, leaks were plugged a while
ago), it should have failed with the error message instead of the segfault.
It's a small bit of memory, so finding contiguous memory should not have been a
problem.  I'm at a loss to explain this.  All the crashes are similar, but they
happen in many different places in the code, all during memory allocation.
Help?

Thanks,
Ron

------------------------------------------------------
This message was sent using Delmarva Online's Webmail.
http://www.dmv.com/

--
   +---------------------------------------------------------------+
   | FAQ: http://qsilver.queensu.ca/~fletchra/Circle/list-faq.html |
   | Archives: http://post.queensu.ca/listserv/wwwarch/circle.html |
   | Newbie List:  http://groups.yahoo.com/group/circle-newbies/   |
   +---------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 06/25/03 PDT