rulururu

post Lightweight, and simple pattern matching function

April 18th, 2008

Filed under: C++ — Kai @ 8:08 pm

If it’s wanted to match a string against a pattern based on wild characters (* and ?). The function presented here is a simple one that does this matching.

Patterns can be:

  • bka-b*
  • bk*bonn
  • bka-bo?n
  • b*b?nn

This is a fast, lightweight, and simple pattern matching function.

The function has to be call the function with two arguments. The first one is the string to be compared with the pattern specified in the second arguments.

The return value is accoring to the state if I should match Exact, Any or AnyRepeat.

bool MatchPattern(const char* szCompare, const char* szPattern) 
{
    enum State 
    {
        Exact,        // exact match
        Any,        // ?
        AnyRepeat    // *
    };
 
    const char* szTmp = szCompare;
    const char* szPat = szPattern;
    const char* q = 0;
    int iState = 0;
 
    bool bMatch = true;
    while (bMatch && *szPat) 
    {
        if (*szPat == '*') 
        {
            iState = AnyRepeat;
            q = szPat+1;
        } 
        else if (*szPat == '?') 
            iState = Any;
        else 
            iState = Exact;
 
        if (*szTmp == 0) 
          break;
 
        switch (iState) 
        {
            case Exact:
                bMatch = *szTmp == *szPat;
                szTmp++;
                szPat++;
                  break;
            case Any:
                bMatch = true;
                szTmp++;
                szPat++;
                  break;
            case AnyRepeat:
                bMatch = true;
                szTmp++;
                if (*szTmp == *szPat) 
                  szPat++;
 
                  break;
        }
    }
 
    if (iState == AnyRepeat) 
        return (*szTmp == *szPat);
    else if (iState == Any) 
        return (*szTmp == *pszPat);
    else 
        return bMatch && (*szTmp == *pszPat);
}

Obviously usage is as expected:

if(MatchPattern("bka-bonn", "b*b?nn"))
{ 
//matches
}
else
{
//no match
}

Finally I have to tell you that function doesn’t perform any error checking.

For matching patterns you also can use regular expressions - a good regex c++ library is provided by boost.

Simple example for it’s usage:

boost::regex expr("abcd[a-z].*");
string line("bka-bonn");
 
     if (boost::regex_search(line, expr)) 
     {
        //matches
     } 
     else 
     {
        //no match
     }

Further information can be read in boost’s docs: Boost.Regex

Indeed regular expressions are much more flexible and powerful but if you just want to match with ? and * wild characters the above mentioned function might be a better solution.

As always you also here have to weigh one thing against another. You might wonder what is important performance or flexibility.
If you just need it at one certain point in your project and it will never be enhanced I’d choose the MatchPattern function cause it’s simple and you don’t have to use an external library.
Otherwise, if a lot of lines in your code are based on matching strings, boost’s regex library should be the recommented way.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

ruldrurd
Powered by WordPress, Content and Design by Kai Bellmann
Entries (RSS) and Comments (RSS)