Regular Expressions, Part 2

September 18, 2009

In the previous exercise, we decided on an intermediate representation for regular expressions and wrote a parser that produces the intermediate representation corresponding to a regular expression given as input.

Your task today is to write the corresponding matcher that takes an intermediate representation and a target string and returns a boolean indicating whether or not the regular expression matches the target string, using the same conventions as the prior exercise. Your matcher will be similar to Rob Pike’s matcher that we studied previously. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

7 Responses to “Regular Expressions, Part 2”

  1. kbob said

    I hope you’re planning on moving on to NFA-based regular expression matching, because the naive, exponential-time regex engine this series is developing is severely suboptimal. Automata, NFA and DFA, are beautiful, useful, and not difficult.

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/09/18/programming-praxis-regular-expressions-part-2/ for a version with comments):

    import Data.List
    
    data Elem = Lit Char | Esc Char | Any | Set Bool [Elem] deriving Show
    data Chunk = Elem Elem | BoL | EoL | Star Elem deriving Show
    
    matchElem :: Elem -> Char -> Bool
    matchElem e c = case e of Lit l    -> l == c
                              Esc s    -> show c == '\\':[s]
                              Set b es -> b == any (`matchElem` c) es
                              Any      -> True
    
    matchHere :: [Chunk] -> String -> Bool
    matchHere (Elem r:rs) (x:xs) = matchElem r x && matchHere rs xs
    matchHere (Star e:r)  xs     = matchStar e r xs
    matchHere [EoL]       xs     = null xs
    matchHere r           _      = null r
    
    matchStar :: Elem -> [Chunk] -> String -> Bool
    matchStar _ r xs     | matchHere r xs = True
    matchStar e r (x:xs) = matchElem e x && matchStar e r xs
    matchStar _ _ _      = False
    
    match :: [Chunk] -> String -> Bool
    match (BoL:r) = matchHere r
    match r       = any (matchHere r) . tails
    
  3. Remco Niemeijer said

    On a side note, the ‘HOWTO: Posting source code’ page no longer shows the actual tags required. You might want to fix that.

    Also, the source code plugin now seems to htmlEncode characters like & and >, which screws up function types and such.

  4. programmingpraxis said

    kbob: This blog is getting programmers to write code, so that they can maintain and improve their coding skills. This blog is not about computer science or the comparative study of algorithms. It really doesn’t matter to me that I have the fastest algorithm that solves the particular programming problem I am discussing. I do have a two-part series regex->nfa, nfa->dfa on my idea list, but that doesn’t mean I’ll ever write it.

  5. programmingpraxis said

    Remco: I fixed the ‘sourcecode’ tags on the HOWTOpage. And I sent a bug report to the WordPress support center. Thanks for pointing out the problems.

  6. programmingpraxis said

    Sheri, from the WordPress Support Department, reports that the bug in the ‘sourcecode’ feature has now been fixed. I reported the bug on Friday evening and it was fixed by Sunday morning. Well done, WordPress!

    Phil

Leave a comment