Re: CODE: Regexp/email check

From: Daniel A. Koepke (dkoepke@circlemud.org)
Date: 06/08/01


On Thu, 7 Jun 2001, Torgny 'Artovil' Bjers wrote:

> Id est, check the email to see if it is a valid email address.

E-mail addresses are trivial to check for validity, without the use of a
regular expression.

An e-mail address consists of

  A mailbox ([a-zA-Z0-9]*)
  A delimiter/separator (@)
  A domain name

The simplest form specifying a mailbox or domain name is

  atext ::= [a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]
  dot-atom ::= atext+ ("." atext+)*

So the following C code snippet verifies that an e-mail address is of the
appropriate format and does not have any illegal characters:

  bool valid_email(const char *em) {
    const char punct_okay[] = "!#$%&'*+/=?^_`{|}~-";
    char *mailbox;
    char *domain;

    mailbox = em;
    domain = strchr(em, '@');

    if (!domain)                /* No '@' separator. */
      return (FALSE);

    /* Cap mailbox string and make domain point immediately after the @ */
    *(domain++) = '\0';

    if (!*mailbox || !*domain) /* No mailbox or domain. */
      return (FALSE);

    while (*mailbox) {
      if (!isalpha(*mailbox) &&
          !isdigit(*mailbox) &&
          !strchr(punct_okay, *mailbox))
        break;

      mailbox++;
    }

    while (*domain) {
      if (!isalpha(*domain) &&
          !isdigit(*domain) &&
          !strchr(punct_okay, *domain))
        break;

      domain++;
    }

    /*
     * If we got through entire string without finding a bad character,
     * then *mailbox == *domain == '\0'.  Otherwise, one or both will
     * return FALSE as the result of the logical not and the function
     * returns FALSE to indicate the e-mail address is invalid.
     */
    return (!*mailbox && !*domain);
  }

Naturally, a Perl programmer, who is accustomed to the use of regular
expressions to do stuff like this will ask: "Is there any shorter way to
do it?  Why are things this long in C?"  The quick answer is there's
probably a shorter way (infinitely many), but none will be as short as the
Perl equivalent because Perl was built from the beginning to be a
text-processing language (and has since expanded territory), and thus has
very strong tools to do this built into the language.  C was originally
meant as a mid-level application language (for, actually, implementing an
operating system and its components) and thus there was not an initial
need for strong text processing capabilities.  C libraries give us these
things, but you can either incorporate the libraries and wrap them to keep
coding in the wrong paradigm (that is, wrong for the language, not wrong
in any ultimate scale) or learn to do things in the C way.


--
Daniel A. Koepke (dak), dkoepke@circlemud.org
Caveat emptor: I say what I mean and mean what I say.  Listen well.
Caveat venditor: Say what you mean, mean what you say.  Say it well.

--
   +---------------------------------------------------------------+
   | FAQ: http://qsilver.queensu.ca/~fletchra/Circle/list-faq.html |
   | Archives: http://post.queensu.ca/listserv/wwwarch/circle.html |
   +---------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 12/05/01 PST