[LONG] Need Opinions on How to Proceed, Serious Errors in Code

From: Mathew Earle Reuther (graymere@zipcon.net)
Date: 10/20/02


I would greatly appreciate the opinions of anyone on this topic.  I'm
currently stuck with a pile of code which, though it has a load of great
features in it, and is well on it's way to being a system which is ready
for building and then testing, and then play . . . (month and a half to 2
months of constant development, building off 3 or so months of prior
development/skillbuilding) . . . is also so bugridden that I have no idea
what to do with it.

If you've been paying attention to my emails to the list, you know I've
been having serious, strange, ugly bugs with no explanation.  (Like, the
code looks fine, or is stock, etc.)  In many cases, gdb is telling me it
can't access the memory when I try to print, like this one: (cygwin box)

Program received signal SIGSEGV, Segmentation fault.
perform_dupe_check (d=0x102c7380) at interpreter.c:1462
1462        next_ch = ch->next;
(gdb) bt
#0  perform_dupe_check (d=0x102c7380) at interpreter.c:1462
#1  0x0049d268 in nanny (d=0x102c7380, arg=0x22fc10 "implement1")
    at interpreter.c:1769
#2  0x00444646 in game_loop (mother_desc=3) at comm.c:759
#3  0x00443979 in init_game (port=4000) at comm.c:387
#4  0x00443797 in main (argc=1, argv=0x10031508) at comm.c:332
#5  0x61005b8e in _libkernel32_a_iname ()
#6  0x61005e2c in _libkernel32_a_iname ()
#7  0x005237e2 in cygwin_crt0 ()
#8  0x0040103c in mainCRTStartup ()
#9  0x77e992a6 in _libkernel32_a_iname ()
(gdb) l
1457      * choose one if one is available (while still deleting the other
1458      * duplicates, though theoretically none should be able to
exist).
1459      */
1460
1461      for (ch = character_list; ch; ch = next_ch) {
1462        next_ch = ch->next;
1463
1464        if (IS_NPC(ch))
1465          continue;
1466        if (GET_IDNUM(ch) != id)
(gdb) print next_ch
$1 = (struct char_data *) 0x104fd11
(gdb) print ch->next
Cannot access memory at address 0x104fec1
(gdb)

Switching to a Redhat 7.2 linux box, linking to efence (which I had hoped
would find me at least a clue to what's wrong) and proceeding gets me
errors like this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1024 (LWP 12793)]
0x08099f85 in fbgetline (fbfl=0x4afa2fec, line=0xbfffd610 "Exp :
3000000036") at diskio.c:28
28          r++;
(gdb) bt
#0  0x08099f85 in fbgetline (fbfl=0x4afa2fec, line=0xbfffd610 "Exp :
3000000036") at diskio.c:28
#1  0x080848e3 in load_char (name=0xbfffd810 "implementor", ch=0x4af9cdfc)
at db.c:2816
#2  0x080b2d6d in nanny (d=0x4b080870, arg=0xbfffdb10 "implementor") at
interpreter.c:1623
#3  0x080753d2 in game_loop (mother_desc=5) at comm.c:759
#4  0x08074b18 in init_game (port=7777) at comm.c:387
#5  0x08074a0a in main (argc=1, argv=0xbfffdee4) at comm.c:332
#6  0x40116336 in __libc_start_main (main=0x8074750 <main>, argc=1,
ubp_av=0xbfffdee4,
    init=0x8049548 <_init>, fini=0x80d6490 <_fini>, rtld_fini=0x4000d2fc
<_dl_fini>,
    stack_end=0xbfffdedc) at ../sysdeps/generic/libc-start.c:129
(gdb) l
23
24        for(; *r && *r != '\n' && r <= fbfl->buf + fbfl->size; r++)
25          *(w++) = *r;
26
27        while(*r == '\r' || *r == '\n')
28          r++;
29
30        *w = '\0';
31
32        if(r > fbfl->buf + fbfl->size)
(gdb) frame 2
#2  0x080b2d6d in nanny (d=0x4b080870, arg=0xbfffdb10 "implementor") at
interpreter.c:1623
1623          if ((player_i = load_char(tmp_name, d->character)) > -1) {
(gdb) l
1618            minimum_color = TRUE;
1619          if (PRF_FLAGGED(d->character, PRF_COLOR_2) &&
!PRF_FLAGGED(d->character, PRF_COLOR_1))
1620            normal_color = TRUE;
1621          if (PRF_FLAGGED(d->character, PRF_COLOR_1) &&
PRF_FLAGGED(d->character, PRF_COLOR_2))
1622            maximum_color = TRUE;
1623          if ((player_i = load_char(tmp_name, d->character)) > -1) {
1624
1625                GET_PFILEPOS(d->character) = player_i;
1626
1627            if (PLR_FLAGGED(d->character, PLR_DELETED)) {
(gdb)

(Note that the line in the pfile is 3000000 . . . not 300000036 . . .)

So, I'm viciously stuck.  I have no clue why the code is so massively
corrupt.  The corruption takes some different forms on the different
systems, but it's there.  efence (at least just -lefence in the makefile)
seems to ignore whatever is happening, and I'm such a novice coder that I
just have no clue what to do next!

Help?  Anyone have any advice at all?  If this were you, how would you fix
it?  If you where to give me advice, what would it be?  Please, I'm
looking at a huge amount of work which seems just completely unrecoverable
to me at this moment.  I need to hear what others think of the situation.

Thanks in advance,

-Mathew

--
   +---------------------------------------------------------------+
   | FAQ: http://qsilver.queensu.ca/~fletchra/Circle/list-faq.html |
   | Archives: http://post.queensu.ca/listserv/wwwarch/circle.html |
   | Newbie List:  http://groups.yahoo.com/group/circle-newbies/   |
   +---------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 06/25/03 PDT