Re: Using lex and yacc for parsing

From: Mark Coletti (markc@dbawdc.com)
Date: 11/01/95


Wout Mertens pounded furiously on the keyboard:

> On Tue, 31 Oct 1995, Mark Coletti wrote:
> > Although CircleMUD has its good sides, it needs some basic things done first
> > before new features are added:
> > 
> > 	o Makefile should not have explicity dependencies; ``makedepend'' or
> > 	  gcc -M should be used to automagically generate them
> > 	o Makefile should rely as much as possible on implicit pattern
> > 	  matching rules to eliminate redundancy and complexity
> Uuh, so why don't you do that? Not that much work... We'll be delighted :)

	Well, now that you mention it, I did this a few days ago!  =8-)

----->8 snip snip snip 8<------
# CircleMUD 3.0 Makefile
#
# $Id: 0009.html,v 1.1 2000/12/19 13:54:03 furry Exp $
#
# This makefile should work with no changes for compiling under SunOS, Ultrix,
# NeXTSTEP, HP/UX, DG/UX, and Linux.  If you're using AIX, IRIX, some version
# of SVR4 or Solaris, uncomment the appropriate lines below.  If you're using
# some other operating system not mentioned, just type 'make' and cross your
# fingers..
#
#	You MUST ``make depend'' first to automatically set up the inter-
#	file dependencies. <1>
#
# Notes:
#	o makefile now dynamically builds dependency lists <1>
#	o makefile complexity reduced by relying on implicit pattern matching
#	  rules instead of explicit production rules
#	o consolidated like macro definitions to reduce complexity
#	o re-formatted file lists to make easier to read
# 
# To Do:
#	o need to set up CircleMUD to work with GNU autoconf utility to
#	  build this makefile so that fingers don't have to be crossed
#	  as much   =8-)
#
# ---
# <1>	you must have ``makedepend'' installed somewhere on your path, natch
#

SHELL    = /bin/sh
INCLUDES = -I.
BINDIR   = ../bin

CC       = gcc

# for AIX
# CC = cc
# for IRIX
# CC = cc -cckr -fullwarn


CFLAGS   = -g -Wall -O

# for AIX
# CFLAGS = -g
# for IRIX
# CFLAGS = -g



LIBS     = -lsocket -lnsl

# SVR4 or Solaris
# LIBS = -lsocket -lnsl
# IRIX
# LIBS = -lmalloc


DEFS     =


# flags for profiling (see hacker.doc for more information)
#
# (just add this to the implicit rule below if you want profiling)

PROFILE = -pg



# implicit rule for building object files

%.o : %.c
	$(CC) -c $(CFLAGS) $(DEFS) $(INCLUDES) $< 


##############################################################################
# Do Not Modify Anything Below This Line (unless you know what you're doing) #
##############################################################################

HDRS = 	boards.h				\
	comm.h					\
	db.h					\
	handler.h				\
	house.h					\
	inet.h					\
	interpreter.h				\
	mail.h					\
	netdb.h					\
	olc.h					\
	screen.h				\
	shop.h					\
	spells.h				\
	structs.h				\
	utils.h


SRCS =  act.comm.c				\
	act.informative.c			\
	act.movement.c				\
	act.obj.c				\
	act.offensive.c				\
	act.other.c				\
	act.social.c				\
	act.wizard.c				\
	ban.c					\
	boards.c				\
	castle.c				\
	class.c					\
	comm.c					\
	config.c				\
	constants.c				\
	db.c					\
	fight.c					\
	graph.c					\
	handler.c				\
	house.c					\
	interpreter.c				\
	limits.c				\
	magic.c					\
	mail.c					\
	mobact.c				\
	modify.c				\
	objsave.c				\
	olc.c					\
	random.c				\
	shop.c					\
	spec_assign.c				\
	spec_procs.c				\
	spell_parser.c				\
	spells.c				\
	utils.c					\
	weather.c


OBJS = 	comm.o 					\
	act.comm.o 				\
	act.informative.o 			\
	act.movement.o 				\
	act.obj.o				\
	act.offensive.o 			\
	act.other.o 				\
	act.social.o 				\
	act.wizard.o 				\
	ban.o 					\
	boards.o				\
	castle.o 				\
	class.o 				\
	config.o 				\
	constants.o 				\
	db.o 					\
	fight.o 				\
	graph.o 				\
	handler.o				\
	house.o 				\
	interpreter.o 				\
	limits.o 				\
	magic.o 				\
	mail.o 					\
	mobact.o 				\
	modify.o				\
	objsave.o 				\
	olc.o 					\
	random.o 				\
	shop.o 					\
	spec_assign.o 				\
	spec_procs.o				\
	spell_parser.o 				\
	spells.o 				\
	utils.o 				\
	weather.o


# duh rules

.PHONY: all
all : .accepted $(BINDIR)/circle utils


# silly license document that must be viewed at least once before building

.accepted:
	@./licheck


.PHONY:	utils
utils : .accepted $(BINDIR)/autowiz $(BINDIR)/delobjs $(BINDIR)/listrent $(BINDIR)/mudpasswd $(BINDIR)/purgeplay $(BINDIR)/showplay $(BINDIR)/sign $(BINDIR)/split


$(BINDIR)/circle : $(OBJS) 
		$(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $(BINDIR)/circle

$(BINDIR)/autowiz : util/autowiz.c structs.h db.h utils.h
	$(CC) $(CFLAGS) -o $(BINDIR)/autowiz util/autowiz.c

$(BINDIR)/delobjs : util/delobjs.c structs.h
	$(CC) $(CFLAGS) -o $(BINDIR)/delobjs util/delobjs.c

$(BINDIR)/listrent : util/listrent.c structs.h
	$(CC) $(CFLAGS) -o $(BINDIR)/listrent util/listrent.c

$(BINDIR)/mudpasswd   : util/mudpasswd.c structs.h
	$(CC) $(CFLAGS) -o $(BINDIR)/mudpasswd util/mudpasswd.c

$(BINDIR)/purgeplay : util/purgeplay.c structs.h
	$(CC) $(CFLAGS) -o $(BINDIR)/purgeplay util/purgeplay.c

$(BINDIR)/showplay : util/showplay.c structs.h
	$(CC) $(CFLAGS) -o $(BINDIR)/showplay util/showplay.c

$(BINDIR)/sign: util/sign.c
	$(CC) $(CFLAGS) -o $(BINDIR)/sign $(LIBS) util/sign.c

$(BINDIR)/split: util/split.c
	$(CC) $(CFLAGS) -o $(BINDIR)/split util/split.c


clean :
	rm -f *.o util/*.o

TAGS  : $(SRCS) $(HDRS)
	etags -T $(HDRS) $(SRCS)

# set this to the GNU C++ include directory (irrelevent?)
DEPENDINCLUDES = -I/opt/gnu/lib/g++-include

depend   : $(SRCS)
	makedepend -DMAKINGDEPEND $(DEFS) $(DEPENDINCLUDES) $(INCLUDES) $(HDRS) $(SRCS)


# DO NOT DELETE THIS LINE -- make depend depends on it.


----->8 snip snip snip 8<------

	Just install the above Makefile in place of the old one and
follow the instructions in the header.  You should be ready ta rock! =8-)



> > 	o Reduce dependency on pre-processor wherever possible
> Duh? Why?

	The pre-processor is used to include files, to conditionally
compile code, and to define macros.  I don't quibble with the first
two points, but the last one has some issues that could be raised.
(Which are admittedly mostly trivial.  :)

	Macros are mostly used to define constants.  With the advent
of the ``const'' keyword in ANSI C, it's really a better idea to
define constants as real variables that have ``const'' pre-pended to
them.  The constant now is located in the symbol table, whereas it
wouldn't if you had defined it as a #DEFINE.  This makes debugging a
bit easier.

	The counter-argument would be, "what if I'm not using an ANSI
C compiler?"  And my counter-counter-argument would be, "this is 1995,
you _should_ be using an ANSI C compiler!"  =8-)

	There are also some other problems associated with using
macros to define paramaterized expressions (e.g., defining MAX and
MIN).  These problems are really more addressable in C++, which
supports inlined functions and function templates that could be used
to replace such macros.  Unfortunately, this is just ole C, so those
options aren't available to us.  :-(   (See Scott Meyer's _Effective
C++_ for more info on this.)


> > 	o Simplify header files by moving as much as possible to
> > 	  implementation files
> You mean things like AFF_INVIS should not be in structs.h but in
mudstuff.h?

	Actually, there are a lot of things that could be done to the
header files:

	o add #ifndef __FOO_H type sentinels
	o if it ain't used outside the header file, move the sucker to
	  the implementation file
	o global header files are Evil!  Break them up into separate,
	  meaningful header files.

		- it's easier to find things that are lumped together
		  that have something in common

		- changes in global header files usually means that a
		  buncha stuff will be recompiled that normally
		  shouldn't be ... this is a badism!

		- sadly, there has to be one exception: usually
		  there's a config header file that's global and needs
		  to be global

	o replace all macro constants with C ``const'' definitions

	The basic heuristic for writing a header file is to have the
barest minimum text possible.  Superfluous cruft should be hidden in
the header file; the more stuff there is in the header file, the more
a newbie has to work to figure out what's going on.


> > 	o Introduce opaque types to reduce code complexity
> struct char_data -> char_data? or what do you mean?

	Gak!  I _knew_ somebody was gonna ask me about that!  8)

	Basically, I'm talking about doing what the X folks have done
with their types.  That is, to put a little object-oriented spin on C
programs by "hiding" data in structures.  Manipulation of that data is
done explicitly by specialized functions.  (In object oriented
parlance, these would sorta be a class' "member functions".)

	So, in "Foo.h" we would have:

--->8 snip! 8<---
#ifndef __FOO_H
#define __FOO_H


typedef struct _FooRec *Foo; /* we don't know the internals to
				_FooRec -- they're "hidden" from us! */
	

/*
 *  Constructors and Destructors
 */

Foo 	fooCreate( void );	/* foo constructor */
void    fooDestroy( Foo foo );	/* foo destructor */


/*
 *  Inspectors
 */

int	     fooGetVal( Foo foo ); /* get value for foo */
const char*  fooGetName( Foo foo ); /* get the foo's name */


/*
 *  Modifiers
 */
int	     fooSetVal( Foo foo, int i ); /* set foo to a new value */
char*	     fooSetName( Foo foo, const char* n ); /* set foo's name */

#endif

--->8 snip! 8<---



	And in the implementation file, we would have:

--->8 snip! 8<---
#include <stdlib.h>
#include <string.h>

#include "foo.h"


/*  This defines the structure that was "invisible" in the header file. */

typedef struct _FooRec {
  int	val;			/* some arbitrary number */
  char*	name;			/* whatever */
} FooClassRec;


/*
** constructor
*/
Foo
fooCreate( void )
{
  Foo foo = malloc( sizeof(FooClassRec) ); /* create a new Foo off the heap */

  if ( ! foo ) return 0;	/* if unable to malloc, return NULL */

  foo->val = 0;			/* set these guys to some sane values */
  foo->name = 0;

  return foo;			/* ... and return the new object */

} /* fooCreate() */



/*
** destructor
*/
void    
fooDestroy( Foo foo )
{
  if ( ! foo ) return;		/* go away if it's invalid */

  if ( foo->name ) free( foo->name ); /* first free up any allocated 
					memory resources */

  free( foo );			/* finally free foo itself */

} /* fooDestroy */




/*
** inspectors
*/

/* if the object is valid, return the internal value; else return 0 */

int	     
fooGetVal( Foo foo )
{
  if ( foo ) return foo->val;
  else return 0;
} /* fooGetVal() */



/* if the object is valid, return a pointer to the name; else return 0 */

const char*  
fooGetName( Foo foo )
{
  if ( foo ) return foo->name;
  else return 0;
} /* fooGetName() */




/*
** modifiers
*/

/* if the object is valid, set the internal value to ``i''; return the
   value as a status check (0 means object was probably invalid) */

int	     
fooSetVal( Foo foo, int i )
{
  if ( foo )
  {
    return foo->val = i;
  }
  else
    return 0;
} /* fooSetVal() */



/* if the object is valid, set the internal name string to a copy of
   ``n''; the pointer of the newly created string is returned as a
   status check (0 if strup() failed, or object was invalid) */

char *
fooSetName( Foo foo, const char* n )
{
  if ( foo )
  {
    return foo->name = strdup( n );
  }
  else
    return (char*) 0;
} /* fooSetName () */

--->8 snip! 8<---

	Some sample test code:

--->8 snip! 8<---
#include <stdio.h>
#include "foo.h"

int
main()
{
  Foo foo = fooCreate();	/* create new foo object */

  fooSetVal( foo, 42 );		/* set some nifto values */
  fooSetName( foo, "Whoo, whoo!" );

				/* now verify that we got 'em */
  printf( "foo: %d, %s\n", fooGetVal( foo ), fooGetName( foo ) );

  fooDestroy( foo );		/* throw it away when we're done */

  exit(0);			/* buh-bye! */
}
--->8 snip! 8<---

	And a sample run:

{flintstone:11} ./test
foo: 42, Whoo, whoo!



	Well, you _did_ ask!  =8-)

	This buys us a few things.  First, the data and the functions
used to manipulate those data are within the same lexical scope.  In
computer science geek-o parlance, they're more "coherent."  This is as
opposed to most C programs where you have structures declared in one
part of the program and the functions that can manipulate instances of
those structures spread all over creation.  By using opaque data
types, we've consolidated everything that manipulates a particular
type into a single area; and all the internal details of that type are
hidden from the programmer (unless they cheat and peek under the
hood!).  Remember what I said about minimalist header files -- a lot
of the cruft that normally would be visible in the header file is
hidden in the implementation file.

	I was just proposing that some of the basic types (e.g.,
structures for players, worlds, zones, and objects) be transformed
into opaque types.  The code would be cleaner and (probably) have
fewer bugs.

> > 	o Rely on lex and yacc for command line parsing
> Why? Isn't it better to make our own parser that will take things like "P s"
> as a format id for a function expecting 1 player and a string, that will 
> process errors and such?

	Nope!  I would rather write a parser that used lex and yacc
(or any of their progeny).  That way I could create more flexible
syntax that would probably be parsed with fewer errors.  Why re-invent
the wheel when there's one already around that works extremely well?

> > 	o Re-visit the command language grammar and its implementation
> See above

	Someone has already lamented that the current commmand line
grammar is too inflexible; extending it is difficult at best.  For
example, how would you go about adding an optional numeric parameter
to specify how many times you want to move in a particular direction?
Looking at ``interpreter.c'', I can't see how to do that straightaway. 

> > 	o Rely on lex and yacc for world, zone, shop, and object file parsing
> Yeees. I can live with that. Only it will be nice to see an implementation...
> Must be ruff stuff.

	Actually, as mentioned in a previous post, this is probably
one of the first areas I'll tackle.  Again, I suspect the dearth of
world designers is caused by the syntactic obsfucation of the zone,
object, and world files.  This _certainly_ can be easily rectified by
using lexx and yacc.  Stay tuned to this mailing list.  ;)

> > 	o Re-visit the world, zone, shop, and object file language syntax
> Agree.

	Type atcha later!

Mark
-- 
Mark Coletti                       |  DBA Systems, Inc.  Fairfax, VA
mcoletti@clark.net                 |  United States Geological Survey
http://www.clark.net/pub/mcoletti  |  Office of Standards & Technology
               Is that seat saved? No, but we're praying for it.



This archive was generated by hypermail 2b30 : 12/07/00 PST