Difference between revisions of "Perl regex"

Latest revision as of 13:34, 16 July 2009

Navbar

computing: programming: Perl: regex

Overview

This article explains regular expressions in terms understandable to mere mortals, and also how to use them in Perl.

Details

Special characters in regex:

. = any character except newline (\n) (if /s option is included, then \n is also matched)
* = 0 or more of previous character
^ = following string begins the line (except [^...] means "not these characters")
$ = preceding string ends the line
[] = list of characters which can satisfy the match at this position
- Note: cannot use special characters like "." as part of list
{} = # of repetitions of previous character:
- {x} -> exactly x repetitions
- {x,y} -> minimum of x repetitions, maximum of y repetitions
| = alternatives
+ = 1 or more of previous character
? after +, *, or {} indicates non-greedy behavior, i.e. match the fewest characters, not the most
- Apparently it can also mean "either zero or one of the preceding group or character", e.g. "the?" matches both "th" and "the"
a-b = range of characters from a to b, e.g. "t-w" means any of t,u,v,w in that position
?= = lookahead (need explanation of how this works) a(?=b) returns "a, but only if it's followed by a b"; the a becomes part of the matched sequence, but the b does not
?<= = reverse lookahead (need explanation of how this works)

Operators used to invoke regex:

=~ returns TRUE if pattern matches
!~ returns FALSE if pattern matches
m/ searches a string for a pattern match; returns true/false scalar and an array of matches (if () are used)
- c don't reset pos on failed matches when using /g
- g (Global) all occurrences – repeat the pattern search until there are no more matches
- i case-Insensitive
- m Multiline mode - ^ and $ match internal lines
- o compile pattern Once
- s match as a Single line - . (dot) matches \n
- x eXtended legibility - free whitespace and comments
s/pattern/replacement/gi; replaces pattern with replacement
- all m/ options are available
- e Evaluate replacement as an expression. May be specified multiple times. 'replacement' is interpreted as a double quoted string unless a single-quote (') is the delimiter.
y/searchlist/replacelist/d: replaces each character found in searchlist with the corresponding character in replacelist
- d just deletes matching characters
tr/ is the same as y/

Examples

These examples have been tested only briefly.

Replace "thingy" with "stuffs" in $string:
- $string =~ s/thingy/stuffs/;
Keep only the part of $string before the final "/" (using "|" as the delimiter instead of "/"):
- $string =~ s|(.*)/[^/]*|$1|;
...after the final "/":
- $string =~ s| ^.*/([^/]*)$|$1|;
...before the final "-":
- $string =~ s|(.*)-[^-]*|$1|;
...before the final ".":
- $string =~ s|(.*)\.[^\.]*|$1|;
...after the final "." (both of these return the full string if no "." is found):
- $string =~ s|^.+\.(.+$)|$1|;
- $string =~ s|^.*\.([^\.]*)$|$1|;

Links

Reference

Regular Expressions (Perl-Compatible) in PHP
- PHP also supports POSIX regex via the ereg*() series of functions, but these are apparently deprecated; to convert code from ereg* to PCRE, just add slashes around the string: /regex expression/

Articles

2007-10-24 A graphical tracer for Perl 6 regexes based on PCR: code for a graphical regex tracer written in Perl 6, with links to some demo sites

@@ Line 1: / Line 1: @@
-[[Category:Techniques]]
+=={{hide|Navbar}}==
-[[Techniques]]: [[Perl]]: [[Perl regex|regex]]
+[[computing]]: [[programming]]: [[Perl]]: [[Perl regex|regex]]
+==Overview==
 This article explains regular expressions in terms understandable to mere mortals, and also how to use them in Perl.
 ==Related Articles==
-*[[regex]]: manpage documentation
+*{{manpagelink|regex}}: manpage documentation
 ==Details==
 Special characters in regex:
-* . = any character
+* '''.''' = any character except newline (\n) (if /s option is included, then \n is also matched)
-* * = 0 or more of previous character
+* '''*''' = 0 or more of previous character
-* ^ = following string begins the line (except [^...] means "not these characters")
+* '''^''' = following string begins the line (except [^...] means "not these characters")
-* $ = preceding string ends the line
+* '''$''' = preceding string ends the line
-* [] = list of characters which can satisfy the match at this position
+* '''[]''' = list of characters which can satisfy the match at this position
-* {} = # of repetitions of previous character
+** Note: cannot use special characters like "." as part of list
-* | = alternatives
+* '''{}''' = # of repetitions of previous character:
-* + = 1 or more of previous character
+** {<u>x</u>} -> exactly <u>x</u> repetitions
-* ''a''-''b'' = range of characters from ''a'' to ''b'' (must be inside [] to be position-sensitive?)
+** {<u>x</u>,<u>y</u>} -> minimum of <u>x</u> repetitions, maximum of <u>y</u> repetitions
+* '''|''' = alternatives
+* '''+''' = 1 or more of previous character
+* '''?''' after '''+''', '''*''', or '''{}''' indicates non-greedy behavior, i.e. match the fewest characters, not the most
+** Apparently it can also mean "either zero or one of the preceding group or character", e.g. "the?" matches both "th" and "the"
+* <u>a</u>'''-'''<u>b</u> = range of characters from <u>a</u> to <u>b</u>, e.g. "t-w" means any of t,u,v,w in that position
+* '''?=''' = lookahead (need explanation of how this works) <u>a</u>'''(?='''<u>b</u>''')''' returns "<u>a</u>, but only if it's followed by a <u>b</u>"; the <u>a</u> becomes part of the matched sequence, but the <u>b</u> does not
+* '''?&lt;=''' = reverse lookahead (need explanation of how this works)
 Operators used to invoke regex:
-* =~ returns TRUE if pattern matches
+* '''=~''' returns TRUE if pattern matches
-* !~ returns FALSE if pattern matches
+* '''!~''' returns FALSE if pattern matches
-* s/''pattern''/''replacement''/ replaces ''pattern'' with ''replacement''
+* '''m/''' searches a string for a pattern match; returns true/false scalar and an array of matches (if () are used)
+** '''c''' don't reset pos on failed matches when using /g
+** '''g''' ('''G'''lobal) all occurrences &ndash; repeat the pattern search until there are no more matches
+** '''i''' case-'''I'''nsensitive
+** '''m''' '''M'''ultiline mode - ^ and $ match internal lines
+** '''o''' compile pattern '''O'''nce
+** '''s''' match as a '''S'''ingle line - . (dot) matches \n
+** '''x''' e'''X'''tended legibility - free whitespace and comments
+* '''s/'''<u>pattern</u>'''/'''<u>replacement</u>'''/'''''gi''; replaces ''pattern'' with ''replacement''
+** all m/ options are available
+** '''e''' '''E'''valuate replacement as an expression. May be specified multiple times. 'replacement' is interpreted as a double quoted string unless a single-quote (') is the delimiter.
+* '''y/'''<u>searchlist</u>'''/'''<u>replacelist</u>'''/'''''d'': replaces each character found in <u>searchlist</u> with the corresponding character in <u>replacelist</u>
+** '''d''' just deletes matching characters
+* '''tr/''' is the same as '''y/'''
 ==Examples==
+These examples have been tested only briefly.
 * Replace "thingy" with "stuffs" in $string:
-** $string = s/thingy/stuffs/;
+** $string =~ s/thingy/stuffs/;
+* Keep only the part of $string before the final "/" (using "|" as the delimiter instead of "/"):
+** $string =~ s|(.*)/[^/]*|$1|;
+* ...after the final "/":
+** $string =~ s|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ^.*/([^/]*)$|$1|;
+* ...before the final "-":
+** $string =~ s|(.*)-[^-]*|$1|;
+* ...before the final ".":
+** $string =~ s|(.*)\.[^\.]*|$1|;
+* ...''after'' the final "." (both of these return the full string if no "." is found):
+** $string =~ s|^.+\.(.+$)|$1|;
+** $string =~ s|^.*\.([^\.]*)$|$1|;
+==Links==
+===Reference===
+* [http://php.net/pcre Regular Expressions (Perl-Compatible)] in [[PHP]]
+** PHP also supports [http://php.net/regex POSIX regex] via the [http://php.net/ereg ereg*()] series of functions, but these are apparently deprecated; to convert code from ereg* to PCRE, just add slashes around the string: /<u>regex expression</u>/
+===Articles===
+* '''2007-10-24''' [http://pugs.blogs.com/pugs/2007/10/a-graphical-tra.html A graphical tracer for Perl 6 regexes based on PCR]: code for a graphical regex tracer written in [[Perl 6]], with links to some demo sites

Difference between revisions of "Perl regex"

Latest revision as of 13:34, 16 July 2009

Contents