Re: Using lex and yacc for parsing

From: Mark Coletti (
Date: 11/07/95

Kenneth Cavness pounded furiously on the keyboard:

> On Tue, 31 Oct 1995, Mark Coletti wrote:

> [Jeremy Elson, Our God of CircleMUD(no joke!), wrote:]


> > 	As to being complex, I heartily disagree!  In this case, the possible
> > permutations for "north" commands are explicitly stated.  Nowhere in the 
> > 1500+ lines of ``interpreter.c'' did I see a way to automagically come up 
> > with shortened prefixes for commands.  And, yet, in four lines I've 
> > explicitly done so in a way that, to me, is intuitive.  :)

> Perhaps a better way would be to say "unduly complicated". The isabbrev()
> procedure in handler.c(?) is what we use to parse abbreviations. IMNSHO,
> a good parser would have command-and-name completion implicit within it,
> with a set order of precedence.

	For relatively small reserved word sets, I agree that
automatic abbreviations are nice conveniences.  In larger languages,
automatic abbreviations have the potential to obsfucate.  I say, give
the user a core set of commands, with simple abbreviations for the
more common ones (e.g., 'n' for 'north'), and the ability to make
aliases to tailor the command set to their taste.

> I have just taken a crash-course in bison, and I will have to admit that
> I think it might be a bully idea. It would be highly portable across frames,
> and even possibly would be easy to port to C++

	... that, and it supports re-entrant parsers!

> (Jeremy, you _will_ make Circle IV OOP C++, right? We can get away
> from the obscure DIKU constructions with a lot of ease, and there
> are already socket libs that make the job of working with TCP/IP
> protocols much easier).

	There are also a lot of other, pre-existing C++ class
libraries that would go a long ways to making programming life easier.

	On a related, but not quite germane topic, you might also want
to look into python.  I've been toying with the idea of writing a MUD
in python, which is an interpreted language that offers the same
semantic support that perl does, but in a cleaner and object-oriented
fashion.  (No bloody $ variables!  Aiee!) Check out for mor information.


> > > What if you have 400 commands?

> > 	Then you might have too many commands?  ;)

> Not at all. The social commands, since they are (needlessly) included in
> the command parser, would easily work past 400.

	Again, these could (and should) be exiled to a separate
symbol look-up table.

> > 	Seriously, though, the more appropriate (and what's typically done)
> > method would be to have a core set of tokens that the lexer explicitly looks
> > for, with the remaining keyword set being in some form of symbol table.  The
> > book I mentioned actually goes on to illustrate just such a set-up.

> Well, the problem with this is the typical zork-esque annoyances; in that
> you have "go north" as a necessity instead of "north", and if you do this
> you have to have two commands for "go north" and "north". A better solution
> IMNSHO is an intuitive parser that doesn't rely on an overly strict set of
> opening words.

	I see these as being somewhat orthogonal issues.  How I
implement the parser is a different problem from the one of developing
a sensible grammar.

> A heirarchial command structure can get very irritating and picky very 
> quickly. LPMuds are notorious for this.

	I've never played an LPMud, so I can't argue effectively here.
I'm still convinced that a good command line "language" that's
intuitive could be developed -- it probably won't have a deep tree, a
la zork, but it shouldn't have a flat syntax, a la CircleMUD.
Somewhere in between the two extremes would probably be fine.

> > > How will you handle conflicting commands?
> > 
> > 	Conflicting command syntax is a sign of linguistic ambiguity, which
> > is a badism.
>       ^^^^^^^^
> ACK! Neologisms are _not_ our friends. :P

	Pshaw!  Neologisms are okiley dokiley by me. %-)

> > 	"Remember that conflicts mean that yacc can't properly parse a
> > 	grammar, probably because it's ambiguous, which means that there are
> > 	multiple possible parses for the same input.  ... this usually points
> > 	to a mistake in your language design.  If a grammar is ambiguous to
> > 	yacc, it's almost certainly ambiguous to humans, too."  
> > 		pg. 64, _Lexx & Yacc_

> It then goes out of its way to point out that there _are_ shift/reduce
> conflicts that are no fault of your parser design, but actually are a 
> problem with the inherent strict need in parsing.

	Yes, _L & Y_  point out that certain conflicts are
occasionally to be expected; they're ok if you know they're coming.
But, you'd better have a durn good reason for 'em!

> Read the "info" files concerning "Bison" that come with the GCC
> implemetations.  =)

	Actually, _any_ of the GNU documentation is excellent.  I
learned more about creating makefiles from the GNU "info" make file,
than I ever did from the vacuous man page.

	I'll counter your recommendation with another one: interested
parties should read the GNU "info" file for flex, too.  


> > > CircleMUD's parser is not efficient at all but it gets the job done and 
> > > it's easy to change.  (In 3.0 you just change the order the terms appear 
> > > in the list to change their precedences.)
> One thing that would be nice to see, perhaps, is the list of commands placed
> into a binary tree. The current method of linear-search is ultra-inefficient.

	Certainly something no worse than logarithmic time should be
implemented.  Perhaps some form of hash table would be more
appropriate -- the closer I can get to constant time symbol table
look-ups, the happier I'll be.

	This should give further impetus for a move to C++ -- there
are numerous class container libraries that offer efficient look-up
times.  I'd suggest the Standard Template Library, but most C++
compilers can't deal with the STL templates quite yet.  :-\

> Also, I don't like the "if it ain't broke, don't fix it" mentality.

	Yep, you betcha.  I completely agree.  I argued as much in an
earlier post.

> > textbook; they can't do the same for the CircleMUD parser.
> And there are no good definitive outer (or inner) documentations for 
> CircleMUD parser.

	That's another problem I have, but it's definitely one that's
not going to be easy to fix.  It's hard to document obsfucated code,
especially retroactively.

> > 	I don't mean for this to sound like a out and out diatribe on
> > CircleMUD.  It's a good program with a lot of potential --- as you've
> > emphasized in the documentation, now the program works, so optimization and
> > polishing can begin.  Working on the above points first before adding 
> > features would go a long ways towards achieving those goals.  Then again, 
> > that's JMHO and YMMV.  ;-)

> Frankly, CircleMUD (and ROM, which is a MErc Diriviative, which automatically
> makes it disgusting :P) is one of the only active DIKU deriviatives left.
> It's a _good_ thing, IMO, to discuss possible improvements.

	Discussing improvements should be a continuous task!

> let's face it. even with the great changes in 3.0, DIKU code is still one
> gigantic kludge. It would be a good thing in future releases if we 
> fundamentally changed some of the things that make it spaghetti code.
> (one of _my_ biggest peeves is the spellcasting parser-within-a-parser).

	I'm actually thinking of making fundamental, but simple
changes to the code and posting on the local ftp site.  Would you (or
anyone else, for that matter), be interested in such code?

> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> | Kenneth Cavness                       "Cenn Buie" -- a proud TxDarkFriend   |
> |            University of Texas at Austin         |
> |         Austin, TX 78705                      |
> | "I lost a testicle because of the evils of MASTURBATION!!!" -- BriceW, a.a  |
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=

Mark Coletti                       |  DBA Systems, Inc.  Fairfax, VA                 |  United States Geological Survey  |  Office of Standards & Technology
	  I have a rapier wit, but everyone keeps parrying.

This archive was generated by hypermail 2b30 : 12/07/00 PST