Re: Cursing code

From: Templar Viper (templarviper@hotmail.com)
Date: 05/14/03


From: "Thomas Arp" <t_arp@stofanet.dk>
> From: "Templar Viper" <templarviper@HOTMAIL.COM>
> > A while ago, I fixed this code together. It checks the argument for
> > cursing (placed in curse_list), and replaces it with a harmless beep. I
> > had to use str_strr, as strstr isn't case sensitive. However, I want to
> > ignore certain characters, such as spaces, full stops and more of those.
>
> I'm a bit unclear on your intent here;
> Do you wish to make sure to catch "CUR SE", too? Or do you mean you have
> 'multiword' curse words ?

Yes, I do want to catch "CUR SE".

> Regardless, your code has a few drawbacks. First, it's slow. The
> implementation
> below will cost something like O(N*N*N*M), where N is the number of words
in
> the curse word list, and M is the length if the string to check.
> Considering that you can have curse word file with perhaps 100+ entries,
> this
> amounts to a lot of cpu time.
>
> Secondly, if you try to catch 'split' curse words, like above, be aware
that
> you'll get a lot of false positives from 'assistent', 'damnation',
> 'as said before', and so on. My example below will remove whitespace
before
> the
> search.

Actually, I made this code with the intent to remove more offensive/sexual
based
cursings. Check the xname file to see what I mean. I don't care about an ass
or
damn every once in a while, but I want the more serious words being beeped
out.

I don't think I will get a lot of false positives, as long as I'm checking
for the really
bad ones :)

> Here's what I suggest:
> download Stephen R. van den Berg's strcasestr() implementation from
>
http://www.uclibc.org/cgi-bin/cvsweb/*checkout*/uClibc/libc/string/Attic/strstr.c?rev=1.6&hideattic=0
> strcasestr() does what your str_strr code does, just at O(N*M), instead of
> O(N*(M*N)) - be aware that strlen() is an expensive function timewise.

Ok, there's one problem - What should VAL() be defined as? Does it compare
ascii values?

> Then, change your code a bit, depending on what you meant above:
>
> > Here is the function:
> >
> > int wordok(char *argument, bool call)
> > {
> >   int i;
> >   char temp[MAX_INPUT_LENGTH];
>     char *p = argument, *q = temp;
> >
> >   /* return valid if list doesn't exist */
> >   if (!argument || curse_num < 1 || !curse_list)
> >  return (1);
> >
> -   strlcpy(temp, argument, sizeof(temp));
> /* this loop copies 'argument' into temp without *s */
> +    for (; *p && p - argument < sizeof(temp);) {
> +      if (*p == ' ' || *p == '\t')
> +        continue;
> +      *q++ = *p++;
> +    }
> +    *q = '\0';
This is an infinite loop methinks. When called with a space, circle jumps
to 100% cpu usage and doesn't repond anymore, and needs to be killed.

<Rest of function>

> You didn't send a copy of search_replace(), so I'll not bother with
> that one. This is implementation should run in O(M+(N*N*M)).

Here is search_replace, I didn't write this myself, I found it somewhere
on the developer site instead.

void search_replace(char *string, const char *find, const char *replace)
{
  char final[MAX_INPUT_LENGTH], temp[2];
  size_t start, end, i;

  while (str_strr(string, find) != NULL) { <- str_strr should be strcasestr
    final[0] = '\0';
    start = str_strr(string, find) - string;
    end = start + strlen(find);

    temp[1] = '\0';

    strncat(final, string, start);

    strcat(final, replace);

    for (i = end; string[i] != '\0'; i++) {
      temp[0] = string[i];
      strcat(final, temp);
    }

    sprintf(string, final);

  }
  return;
}

Thanks for the attention!

--
   +---------------------------------------------------------------+
   | FAQ: http://qsilver.queensu.ca/~fletchra/Circle/list-faq.html |
   | Archives: http://post.queensu.ca/listserv/wwwarch/circle.html |
   | Newbie List:  http://groups.yahoo.com/group/circle-newbies/   |
   +---------------------------------------------------------------+



This archive was generated by hypermail 2b30 : 06/26/03 PDT