4  An extended example

Here’s an extended example from Friedl [1, p 189] that covers many of the features described above. The problem is to fashion a regexp that will match any and only IP addresses or dotted quads, ie, four numbers separated by three dots, with each number between 0 and 255. We will use the commenting mechanism to build the final regexp with clarity. First, a subregexp n0‑255 that matches 0 through 255.

(define n0-255
  "(?x:
  \\d          ;  0 through   9
  | \\d\\d     ; 00 through  99
  | [01]\\d\\d ;000 through 199
  | 2[0-4]\\d  ;200 through 249
  | 25[0-5]    ;250 through 255
  )"")

The first two alternates simply get all single- and double-digit numbers. Since 0-padding is allowed, we need to match both 1 and 01. We need to be careful when getting 3-digit numbers, since numbers above 255 must be excluded. So we fashion alternates to get 000 through 199, then 200 through 249, and finally 250 through 255.6

An IP-address is a string that consists of four n0‑255s with three dots separating them.

(define ip-re1
  (string-append
    "^""        ;nothing before

    n0-255     ;the first n0-255,

    "(?x:""     ;then the subpattern of

    "\\.""      ;a dot followed by

    n0-255     ;an n0-255,

    ")""        ;which is

    "{3}""      ;repeated exactly 3 times

    "$""        ;with nothing following

    ))

Let’s try it out.

(pregexp-match ip-re1
  "1.2.3.4"")
=> ("1.2.3.4"")

(pregexp-match ip-re1
  "55.155.255.265"")
=> #f

which is fine, except that we also have

(pregexp-match ip-re1
  "0.00.000.00"")
=> ("0.00.000.00"")

All-zero sequences are not valid IP addresses! Lookahead to the rescue. Before starting to match ip‑re1, we look ahead to ensure we don’t have all zeros. We could use positive lookahead to ensure there is a digit other than zero.

(define ip-re
  (string-append
    "(?=.*[1-9])"" ;ensure there’s a non-0 digit

    ip-re1))

Or we could use negative lookahead to ensure that what’s ahead isn’t composed of only zeros and dots.

(define ip-re
  (string-append
    "(?![0.]*$)"" ;not just zeros and dots

                 ;(note: dot is not metachar inside [])

    ip-re1))

The regexp ip‑re will match all and only valid IP addresses.

(pregexp-match ip-re
  "1.2.3.4"")
=> ("1.2.3.4"")

(pregexp-match ip-re
  "0.0.0.0"")
=> #f


6 Note that n0-255 lists prefixes as preferred alternates, something we cautioned against in sec 3.5. However, since we intend to anchor this subregexp explicitly to force an overall match, the order of the alternates does not matter.