Re: [CODE] #if parser

From: Daniel A. Koepke (dkoepke@california.com)
Date: 06/30/99


On Jul 1, Andrew Ritchie spake to the list:

> I am trying to develop, at the moment, an #if #else-type parser for dynamic
> room descriptions, much like that of the gnu-c #if parser.

Nota bene, this isn't specific to GNU, and it's not a part of the C
programming language, per se.  It's part of the preprocessor.  This is
all, of course, unimportant, and probably a sure sign that I'm a pedant.

> Now for OR statements (|'s), I simply cycle through all the 0s and 1s
> and make sure that at least one of them is a 1, or true.  For AND
> statements (&), I simply cycle through them and make sure that ALL of
> them are set to 1, or set true.

Just to ensure you're not ignoring the most obvious optimization here,
statements should cease being considered once their outcome is certain.
That is, "#if FOO & BAR," should never even look at, expand, or consider
BAR until it's determined that FOO is true.  You will need to implement a
means by which you can detect the form of an #if statement that doesn't
have too much processor overhead associated with it.  This should be done
as a prescan for argument matching.

> Not sure how I would go about doing this. Obviously, I'd need a function to
> parse () statements. Anyone got any ideas?

    if (*lhs == '(') {
        char tmp [512], * ptr = tmp;
        *ptr = '\0'; /* sanity */
        while (isspace(*++lhs)) ;
        while (*lhs && *lhs != ')') *(ptr++) = *(lhs++);
        while (isspace(*ptr)) *(ptr--) = '\0';
        ParseIfStatement(body);
    }

You don't need a function specifically for it, really.  Logical AND and
logical OR are binary operators -- they only operate on two arguments, and
ALL statements using AND and OR can be reduced to this.  Consider,

     #if FOO & (BAR | BAZ)
     `-> DetermineIfValue("FOO & (BAR | BAZ)");
         `-> DetermineType("FOO & (BAR | BAZ)");
             `-> IFTYPE_AND
         ,-------'
         `-> DetermineIfValue("FOO");
             `-> TRUE
         ,-------'
         `-> DetermineIfValue("(BAR | BAZ)"); /* Treated as 1 arg */
             `-> DetermineType("BAR | BAZ");
                 `-> IFTYPE_OR
             ,-------'
             `-> DetermineIfValue("BAR");
                 `-> FALSE
             ,-------'
             `-> DetermineIfValue("BAZ");
                 `-> TRUE
         ,-----------'
         `-> TRUE

No conditions are treated specially -- they all get parsed through
DetermineIfValue().  The function doesn't keep tokens in store -- it
expires them once they're not needed.  That is, the return value of "FOO"
is TRUE, but it isn't inserted back into the #if for later consideration.
If we were an OR statement we'd return TRUE right then and there; since
we're an AND statement, we eliminate the token, instantly and painlessly
simplifying the statement to just "#if BAR | BAZ."  We theoritically never
need to store any return values because we can ALWAYS rely upon
simplification,

    #if (BAR | (BAZ & FOO) | FUM) & FRED & (BARNEY & (BAMBAM | PEBBLES)

can still be parsed cleanly without storing state.  The parser doesn't
attempt to second-guess the order arguments are delivered in.  So while it
might be slightly more efficient to leap ahead and consider FRED (we could
do this based upon the assumption that no condition has side-effects; a
real programming language can't make any such assumptions about its
arguments), we will begin with (BAR | (BAZ & FOO) | FUM).  We then find
the general form, which is COMPLEX_OR (a statement is qualified as complex
if it has more than three arguments; note that the entire statement is
first determined to be a COMPLEX_AND statement before we take the first
argument, which is our (BAR | (BAZ & FOO) | FUM)), and we consider BAR.
The path of execution might look like this,

    (BAR | (BAZ & FOO) | FUM)
    `-> DetermineType("BAR | (BAZ & FOO) | FUM");
        `-> IFTYPE_COMPLEX_OR
    ,-------'
    `-> DetermineIfType("BAR");
        `-> FALSE
    ,-------'
    `-> DetermineIfType("(BAZ & FOO)");
        `-> DetermineType("BAZ & FOO");
            `-> IFTYPE_AND
        ,-------'
        `-> DetermineIfType("BAZ");
            `-> TRUE
    ,-----------'
    `-> TRUE

if we have a situation where BAR = FALSE, BAZ = TRUE.  We don't know, nor
care about, the state of the other arguments.

As to how you can reduce any statement using binary operators down to just
two arguments, "#if ((FOO & BAR) | BAZ & FUM & FRED) | BARNEY," becomes,

    #if ((((FOO & BAR) | BAZ) & FUM) & FRED) | BARNEY

Thus we consider FOO & BAR, we return TRUE or FALSE.  The implementation
is a piece of cake with a recursive function, and it's behavior is easy to
follow.  With an AND statement we short-circuit and return FALSE if one of
the arguments returns FALSE; with an OR statement we short-circuit and
return TRUE if one of the arguments returns TRUE.  In this way, only an
extremum are ever fully parsed (an AND statement which is wholly true or
an OR statement which is wholly false).

Was that of any help?

-dak


     +------------------------------------------------------------+
     | Ensure that you have read the CircleMUD Mailing List FAQ:  |
     |  http://qsilver.queensu.ca/~fletchra/Circle/list-faq.html  |
     +------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 12/15/00 PST