Code/etc: Command interpreter improvements ...

From: Daniel Koepke (dkoepke@CALIFORNIA.COM)
Date: 05/25/98


[it's an ObCircle I moved out of an older message because it's long]

ObCircle: (these things get cheaper and cheaper all the time)

Due to some discussion (yeah, right) on rgma, I've been looking at
implementing a more complete command interpreter in my mud.  This
new and improved interpreter would not only include a "natural"
language parser, but improvements to the shorthand stuff as well.  As
of yet, I'm looking to allow (all of the below should do the same
thing),

  > get 2.bread bag; eat bread; burp
  > get 2.bread bag; eat it; burp
  > get second bread bag; eat it; burp
  > get 2.bread bag | eat ; burp
  > Get the second bread from my leather bag and eat it, then burp.
  > Get the second bread from my bag, eat it, and burp.

The short forms, even with the addition of 'it' and the 'pipe' command
are easy (the pipe would function something like the UNIX shell pipe,
it would pass the acted upon object to the command after the pipe, so
since the bread was the primary thing acted upon [the bag was also
acted upon, but was only secondary], we pass the bread to the 'eat'
command).  But the hard part comes with the longer sentences; we need
to, at the very least, attempt to understand some English so we can
know what to expect.  So, we start off by tokenizing the string and
then examining it word by word.  We only understand verbs in our
command table, articles can be discarded (and prep.?), and we don't
attempt to tell the difference between a adjectives and nouns -- we
just get the object to be acted upon, no matter how many adjectives
are used before the noun.

  "get" -> verb (article/object follows)
  "the" -> definite article (object follows)
  "second bread" -> object (prep/conj follows)

If the sentence always ended there, it'd be trivial to add support.
In fact, CircleMUD is very close because it discards fill words.  It
would end up with "get second bread", and adding support for first,
second, third (interpret as dot notation) would be fairly easy.  We'd
get, "get 2.bread".  However (and unfortunately), the sentence doesn't
end there,

  "from" -> prep (article/object follows)
  "my leather bag" -> object (prep/conj follows)

We take 'my' as an adjective and even go so far as to pass it (along
with "leather bag") to generic_find().  This is because there is a
rather cheap and easy way to parse 'my' -- Mailer Code(tm) will
illustrate the point,

  arg = one_argument(arg, name);
  skip_spaces(&arg);

  if (!*name)
    return (0);

  *tar_ch = NULL;
  *tar_obj = NULL;

  /*
   * How to parse 'my'  -dak
   * - If the first word is 'my' and we have more words following ...
   * - If it has 'my', only look for objects.
   * - If it has 'my', only look in inventory and equipment list.
   */
  if (!str_cmp(name, "my") && *name) {
    arg = one_argument(arg, name);
    REMOVE_BIT(bitvector, FIND_CHAR_ROOM | FIND_CHAR_WORLD |
                          FIND_OBJ_ROOM | FIND_OBJ_WORLD);
  }

Circle still won't be friendly to "leather bag", but it should work
fine for "my bag".  The "leather bag" stuff will have to be handled
elsewhere (get_obj_in_list_vis(), I guess), but we're one step closer
to being able to understand a full sentence.

Hmm, now onto commas, which are the source of a headache for me.  I
think it'd be possible to parse both lists of objects and lists of
commands to perform correctly.  Take, for instance,

  Take the big, red, rubber ball.
  Take the red ball, green ball, and yellow ball.
  Take the red ball, the green ball, and the yellow ball.
  Take the big, red, rubber ball, the medium-sized, yellow, plastic
    ball, and the small, green, tennis ball.

The first sentence is easy enough to parse.  We assume that anything
following a comma is either the complete (unabbreviated) name of a
command, or it is part of a list.  This rule works for all of the
sentences, but how to parse the fact that it is a list becomes
difficult.  If we assume that if it's not a verb, it's another
adjective describing the object for "take", then the first sentence
can be parsed as "take" and "big, red, rubber ball".

The next sentence can be parsed as a list of objects by making the
some assumption for commas (either a verb follows it, or it's part of
a list).  Since we have "red ball" with no commas in it, we take it as
the complete name of an object.  Same goes for the next two.  This
might be a problem for, "Take the bread, meat, mustard, and cheese."
(I'm hungry, BTW.)

The next sentence is actually easier to parse than the previous.  The
article 'the' tells us it's a list right from the get go.  The will
never follow a comma unless it's a list.

The last one is a headache to read for human eyes, let alone a
computer.  Any thoughts?

Right, so we've defined some rules for commas, and go back to our
original sentence ("Take the second bread from my leather bag, eat it,
and burp.")  We look after the comma and, thankfully, find a verb.
Happily we take "eat it" as a seperate command.  The same goes for
"burp".  Done!  Finally ...

Alright -- I've gone on long enough.  I'd appreciate any comments,
especially for dealing with ambiguous situations (e.g., "Take the big,
red, rubber ball, the small, green, tennis ball, and the medium-sized,
yellow, plastic ball, and throw them.")

-dak


     +------------------------------------------------------------+
     | Ensure that you have read the CircleMUD Mailing List FAQ:  |
     | http://democracy.queensu.ca/~fletcher/Circle/list-faq.html |
     +------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 12/15/00 PST