pregexp.scm
provides the procedures
pregexp
, pregexp‑match‑positions
,
pregexp‑match
, pregexp‑split
, pregexp‑replace
,
pregexp‑replace*
, and pregexp‑quote
.
All the identifiers introduced
by pregexp.scm
have the prefix pregexp
, so they
are unlikely to clash with other names in Scheme,
including those of any natively provided regexp
operators.
pregexp
The procedure pregexp
takes
a U-regexp, which is a string, and returns
an S-regexp, which is a tree.
(pregexp "c.r"") => (:sub (:or (:seq #\c :any #\r)))
There is rarely any need to look at the S-regexps
returned by pregexp
.
pregexp‑match‑positions
The procedure pregexp‑match‑positions
takes
a
regexp pattern and a text string, and returns a
match if the regexp matches (some part of) the text string.
The regexp may be either a U- or an S-regexp.
(pregexp‑match‑positions
will internally compile a
U-regexp to an S-regexp before proceeding with the
matching. If you find yourself calling
pregexp‑match‑positions
repeatedly with the same
U-regexp, it may be advisable to explicitly convert the
latter into an S-regexp once beforehand, using
pregexp
, to save needless recompilation.)
pregexp‑match‑positions
returns #f
if the regexp did not
match the string; and a list of index pairs if it
did match. Eg,
(pregexp-match-positions "brain"" "bird"") => #f (pregexp-match-positions "needle"" "hay needle stack"") => ((4 . 10))
In the second example, the integers 4 and 10 identify the substring that was matched. 4 is the starting (inclusive) index and 10 the ending (exclusive) index of the matching substring.
(substring "hay needle stack"" 4 10) => "needle""
Here, pregexp‑match‑positions
’s return list contains only
one index pair, and that pair represents the entire
substring matched by the regexp. When we discuss
subpatterns later, we will see how a single match
operation can yield a list of submatches.
pregexp‑match‑positions
takes optional third
and fourth arguments that specify the indices of
the text string within which the matching should
take place.
(pregexp-match-positions "needle"" "his hay needle stack -- my hay needle stack -- her hay needle stack"" 24 43) => ((31 . 37))
Note that the returned indices are still reckoned relative to the full text string.
pregexp‑match
The procedure pregexp‑match
is called
like pregexp‑match‑positions
but instead of returning index pairs it returns the
matching substrings:
(pregexp-match "brain"" "bird"") => #f (pregexp-match "needle"" "hay needle stack"") => ("needle"")
pregexp‑match
also takes optional third and
fourth arguments, with the same meaning as does
pregexp‑match‑positions
.
pregexp‑split
The procedure pregexp‑split
takes
two arguments, a
regexp pattern and a text string, and returns a list of
substrings of the text string, where the pattern identifies the
delimiter separating the substrings.
(pregexp-split ":"" "/bin:/usr/bin:/usr/bin/X11:/usr/local/bin"") => ("/bin"" "/usr/bin"" "/usr/bin/X11"" "/usr/local/bin"") (pregexp-split " "" "pea soup"") => ("pea"" "soup"")
If the first argument can match an empty string, then the list of all the single-character substrings is returned.
(pregexp-split """ "smithereens"") => ("s"" "m"" "i"" "t"" "h"" "e"" "r"" "e"" "e"" "n"" "s"")
To identify one-or-more spaces as the delimiter,
take care to use the regexp " +""
, not " *""
.
(pregexp-split " +"" "split pea soup"") => ("split"" "pea"" "soup"") (pregexp-split " *"" "split pea soup"") => ("s"" "p"" "l"" "i"" "t"" "p"" "e"" "a"" "s"" "o"" "u"" "p"")
pregexp‑replace
The procedure pregexp‑replace
replaces
the
matched portion of the text string by another
string. The first argument is the pattern,
the second the text string, and the third
is the insert string (string to be inserted).
(pregexp-replace "te"" "liberte"" "ty"") => "liberty""
If the pattern doesn’t occur in the text
string, the returned string is identical (eq?
)
to the text string.
pregexp‑replace*
The procedure pregexp‑replace*
replaces
all
matches in the text string by the insert
string:
(pregexp-replace* "te"" "liberte egalite fraternite"" "ty"") => "liberty egality fratyrnity""
As with pregexp‑replace
, if the pattern doesn’t
occur in the text string, the returned string is
identical (eq?
) to the text string.
pregexp‑quote
The procedure pregexp‑quote
takes
an arbitrary string and returns a U-regexp
(string) that precisely represents it. In particular,
characters in the input string that could serve as
regexp metacharacters are escaped with a
backslash, so that they safely match only themselves.
(pregexp-quote "cons"") => "cons"" (pregexp-quote "list?"") => "list\\?""
pregexp‑quote
is useful when building a composite
regexp from a mix of regexp strings and verbatim strings.