Here’s an extended example from
Friedl [1, p 189]
that covers many of the features described
above. The problem is to fashion a regexp that will
match any and only IP addresses or dotted
quads, ie, four numbers separated by three dots, with
each number between 0 and 255. We will use the
commenting mechanism to build the final regexp with
clarity. First, a subregexp n0‑255
that matches 0
through 255.
(define n0-255 "(?x: \\d ; 0 through 9 | \\d\\d ; 00 through 99 | [01]\\d\\d ;000 through 199 | 2[0-4]\\d ;200 through 249 | 25[0-5] ;250 through 255 )"")
The first two alternates simply get all single- and double-digit numbers. Since 0-padding is allowed, we need to match both 1 and 01. We need to be careful when getting 3-digit numbers, since numbers above 255 must be excluded. So we fashion alternates to get 000 through 199, then 200 through 249, and finally 250 through 255.6
An IP-address is a string that consists of
four n0‑255
s with three dots separating
them.
(define ip-re1 (string-append "^"" ;nothing before n0-255 ;the first n0-255, "(?x:"" ;then the subpattern of "\\."" ;a dot followed by n0-255 ;an n0-255, ")"" ;which is "{3}"" ;repeated exactly 3 times "$"" ;with nothing following ))
Let’s try it out.
(pregexp-match ip-re1 "1.2.3.4"") => ("1.2.3.4"") (pregexp-match ip-re1 "55.155.255.265"") => #f
which is fine, except that we also have
(pregexp-match ip-re1 "0.00.000.00"") => ("0.00.000.00"")
All-zero sequences are not valid IP addresses!
Lookahead to the rescue. Before starting to match
ip‑re1
, we look ahead to ensure we don’t have all
zeros. We could use positive lookahead
to ensure there is a digit other than
zero.
(define ip-re (string-append "(?=.*[1-9])"" ;ensure there’s a non-0 digit ip-re1))
Or we could use negative lookahead to ensure that what’s ahead isn’t composed of only zeros and dots.
(define ip-re (string-append "(?![0.]*$)"" ;not just zeros and dots ;(note: dot is not metachar inside []) ip-re1))
The regexp ip‑re
will match
all and only valid IP addresses.
(pregexp-match ip-re "1.2.3.4"") => ("1.2.3.4"") (pregexp-match ip-re "0.0.0.0"") => #f
6 Note that n0-255 lists prefixes as preferred alternates, something we cautioned against in sec 3.5. However, since we intend to anchor this subregexp explicitly to force an overall match, the order of the alternates does not matter.