Re: Using lex and yacc for parsing

From: Jeremy Elson (jelson@blaze.cs.jhu.edu)
Date: 10/31/95


> 	Hopefully I've shown that lex and yacc (or their equivalents)
> are way useful at building parsers.  In a matter of about half an
> hour, I was able to cobble together a very, very simple parser that
> would've taken much longer to write out by hand.  Better yet, the
> generated parser is more efficient than one I could write in a
> reasonable amount of time.  To quote from _Lex and Yacc_ by John
> Levine, Tony Mason and Doug Brown, "A lex lexer is almost always
> faster than a lexer you might write in C by hand." (pg. 1) Also, it's
> certainly _much_ easier to understand than the one found (somewhere)
> in CircleMUD, IMHO.  =8-)

Lex is a great tool, but I have avoided including a lex-built parser for
a number of reasons.  Mainly, because the efficiency gains in using a
lex-built parser are not worth the complexities it introduces.

For example, in your example lex file, you had to enumerate

n
no
nor
north

For north!  This is complex.  What if you have 400 commands?  How will you
handle conflicting commands?  You'd have to do something like..

n -> north
no -> north
nor -> north
nos -> noshout
nort -> north
north -> north
nosh -> noshout
nosho -> noshout
noshou -> noshout
noshout -> noshout

What would you have to do if you decide that "n" means "noshout" instead
of north?  What if you have 15 different commands that start with n?

CircleMUD's parser is not efficient at all but it gets the job done and it's
easy to change.  (In 3.0 you just change the order the terms appear in the
list to change their precedences.)

Many people have said in the past that the CircleMUD parser is inefficient
and should be changed.  I agree that it's ineffeicient but I do not think
it should be changed because it's simple and it's efficient *enough*.
Have you ever profiled a running CircleMUD?  If you do, you'll find an
incredibly small amount of time (say, 0.5% or something, if I remember
correctly) is spent in command_interpreter() and its subsidiaries.

If you make CircleMUD's parser *10 times* more efficient, you'll make
CircleMUD 0.45% faster.  Is that really worth it?

Besides, CPU time is cheap -- people's time is *not* cheap.  If I've
learned anything about software engineering, it is that your goal should
be to reduce the amount of work *people* have to do or the amount of time
people have to wait, not the amount of work computers have to do.  Computers
don't care.

In other words, if you want to rewrite a huge chunk of CircleMUD code in
a way that's more complex than it already is, your time would be much
better spent attacking a piece of the code that could benefit much more
from improvement -- say, the way mobile specprocs are called every 10
seconds, or changing the specproc structure over to being event driven,
or some such.

Regards,

Jeremy



This archive was generated by hypermail 2b30 : 12/07/00 PST