Re: gdb man page (NOT on unix)

From: Sammy (Samedi@cris.com)
Date: 01/17/97


On Fri, 17 Jan 1997 BuckFloyd@aol.com wrote:

- Umm.. I'm having a NASTY crash bug that seems to 
- occur most often after zone_reset.. I've got most of the
- functions in heartbeat() in comm.c flagged to log() 
- where they are at in the code.. no avail.. I can't track
- it down.. so, I'm going to compile with gdb enabled.
- 
- My problem?  I'm not on UNIX.  I'm on OS/2, running the emx
- ports of the more-common unix utils, including gcc and gdb.
- I don't have any man pages for gdb, so i'm kinda stuck.
- 
- Any pages/docs available on the net, or should I go avail
- myself of a UNIX book from the store? (My unix skills are 
- limited to "ls, cd, ps, and grep").

GDB has some online help, tho it's not the best.  It does at least give a
summary of commands and what they're supposed to do.  I just mailed
someone on some of my tricks.  Here's a short intro to gdb and then my
note on expanded bughunting.

If you've got a core file, go to your top circle directory and type:

> gdb bin/circle lib/core

If you want to hunt bugs in real time (causing bugs to find the cause as
opposed to checking a core to see why the mud crashed earlier) use:

> gdb bin/circle

If you're working with a core, gdb should show you where the crash
occurred.  If you get an actual line that failed, you've got it made.  If
not, the included message should help.  If you're working in real time,
now's the time to crash the mud so you can see what gdb catches.

When you've got the crash info, you can type "where" to see which function
called the crash function, which function called that one, and so on all
the way up to main().

I should explain about "context"  You may type "print ch" which you would
expect to show you the ch variable, but if you're in a function that
doesn't get a ch passed to it (real_mobile, etc), you can't see ch because
it's not in that context.  To change contexts (the function levels you saw
with where) type "up" to go up.  You start at the bottom, but once you go
up, and up, and up, you can always go back "down".  You may be able to go
up a couple functions to see a function with ch in it, if finding out who
caused the crash is useful (it normally isn't).

The "print" command is probably the single most useful command, and lets
you print any variable, and arithmetic expressions (makes a nice
calculator if you know C math).  Any of the following are valid and
sometimes useful:

print ch (fast way to see if ch is a valid pointer, 0 if it's not)
print *ch (prints the contents of ch, rather than the pointer address)
print ch->player.name (same as GET_NAME(ch))
print world[ch->in_room].number (vnum of the room the char is in)

etc..

Note that you can't use macros (all those handy psuedo fntions like
GET_NAME and GET_MAX_HIT), so you'll have to look up the full structure
path of variables you need.

Type "list" to see the soource before and after the line you're currently
looking at.  There are other list options but I'm unfamiliar with them.

Hmm that's all I can think of for basics.  Usuaally when you get a crash,
you'll see the line where the crash occurred, and printing all the
variables on that line will come up with at least one null pointer, which
means you probably forgot a check for null.  I think I'm getting into some
of the topics in the attached message so I'll move on to that now :)

Sam


- I am having a lot of crashes recently, and anything you can tell me in 
- how to track them down with gdb would be helpful.  One problem I am 
- encountering with gdb a lot is that it doesn't know where it crashed 
- (just gives empty brackets) and gives a ?? for the files.  Any idea what 
- this could be?  I have no idea how to use gdb really effectively, and 
- trying to track down these crash bugs is a pain in the butt.  Or, if you 
- can just point me in the direction of some manuals or something that 
- would be useful to learn from, I would be most appreciative.  Tanks man.

THere's only a couple of commands I use in gdb, though with some patience
they can be very powerful.  The only commands I've ever used are:

run			well, duh :P
print <variable>	also duh, tho it does more than you might think
list			shows you the source code in context
break <function>	set a breakpoint at a function
clear <function>	remove a breakpoint
step			execute one line of code
cont			continue running after a break or ctrl-c

I've run into those nasty problems you mentioned quite a few times.  The
cause is a memory problem, usually with pointers.  I think the most commom
cause is pointers to nonexistent memory.  If you free a structure, or a
sting or something, the pointer isn't always set to NULL, so you may have
code that checks for a NULL pointer that thinks the pointer is ok since
it's not NULL.  You should make sure you always set pointers to NULL after
freeing them.

Ok now for the hard part.  If I remember right, this was a problem with
medit, right?  If so, then you can probably duplicate it by using a
specific sequence of actions.  That makes things much easier.  What you'll
have to do is pick a function to "break" at.  The ideal place to break is
immediately before the crash.  If I remember right, the crash was when you
saved mobs, so you might be able to "break mobs_to_file".  Try that one
out first.

When you medit save, the mud will hang.  GDB will either give you segfault
info, or it will be stopped at the beginning of mobs_to_file.  If it
segfaulted, pick an earlier function, like copy_mobile, or even do_medit.

When you hit a breakpoint, print the variables that are passed to the
function to make sure they look ok.  Note that printing the contents of
pointers is possible with a little playing around.  For example, if you
"print ch", you get a hex number that shows you the memory location where
ch is at.  It's a little helpfule, but try "print *ch" and you'll notice
that it prints the contents of the ch structure, which is usually more
useful.  "print ch->player" will give you the name of the person who
entered the command you're looking at, and some other info.  If you get a
"no ch in this context" it's because the ch variable wasn't passed to the
function you're currently looking at.

Ok so now you're ready to start stepping.  When GDB hit your breakpoint,
it showed you the first line of executable code in your function, which
will sometimes be in your variable declarations if you initialized any
variables (ex: int i = 0).  As you're stepping through lines of code,
you'll see one line at a time.  Note that the line you see hasn't been run
yet.  It's actually the _next_ line to be executed.  So if the line is
"a = b + c;", printing a will show you what a was before this line, not
the sum of b and c.

If you have an idea of where the crash is occurring, you can keep stepping
till you get to that part of the code (tip: pressing return will repeat
the last GDB command, so you can type step once, then keep pressing return
to step quickly).  If you have no idea where the problem is, the quick and
dirty way to find your crash is to keep pressing return rapidly (don't
hold the eturn key or you'll probably miss it).  When you get the seg
fault, you can't step any more, so it should be obvious when that happens.

Now that you've found the exact line where you get the crash, you should
start the mud over and step more slowly this time. What I've found that
works really well to save time is to create a dummy function.  THis one
will work just fine:

void dummy(void){}

Put that somewhere in the file you're working on.  Then, right before the
crash, put a call to dummy in the code (ex: "dummy();").  Then set your
breakpoint at dummy, andwhen you hit the breakpoint, step once to get back
to the crashing code.

Now you're in total control.  You should be looking at the exact line that
gave you the crash last time.  Print *every* variable on this line.
Chances are one of them will be a pointer to an unaccessable memory
location.  For example, printing ch->player.name may give you an error.
If it does, work your way back and print ch->player to make sure that
one's valid, and if it isn't, try printing ch.

Somewhere in there you're going to have an invalid pointer.  Once you know
which one it is, it's up to you to figure out why it's invalid.  You may
have to move dummy() up higher in the code and step slowly, checking your
pointer athe way to see where it changes from valid to invalid.  You may
just need to NULL a free'd pointer, or you may have to add a check for a
NULL pointer, or you may have screwd up a loop.  I've done all that and
more :)

Well that's it in a nutshell.  There's a lot more to GDB that I haven't
even begun to learn, but if you get comfortable with print and stepping
you can fix just about any bug.  I spent hours on the above procedure
trying to get my ascii object and mail saving working right, but it could
have taken weeks without gdb.  The only other suggestion I have is to
check out the online gdb help.  It's not very helpful for learning, but
you can see what commands are available and play around with them to see
if you can find any new tools.


+-----------------------------------------------------------+
| Ensure that you have read the CircleMUD Mailing List FAQ: |
|   http://cspo.queensu.ca/~fletcher/Circle/list_faq.html   |
+-----------------------------------------------------------+



This archive was generated by hypermail 2b30 : 12/18/00 PST