Perl regex
Jump to navigation
Jump to search
Computing: Programming: Perl: regex
This article explains regular expressions in terms understandable to mere mortals, and also how to use them in Perl.
Related Articles
- regex: manpage documentation
Details
Special characters in regex:
- . = any character except newline (\n) (Questions: is there any special character which represents all characters? Does [.] also mean any-character-except-newline, or is it interpreted as literal period?)
- * = 0 or more of previous character
- ^ = following string begins the line (except [^...] means "not these characters")
- $ = preceding string ends the line
- [] = list of characters which can satisfy the match at this position
- {} = # of repetitions of previous character:
- {x} -> exactly x repetitions
- {x,y} -> minimum of x repetitions, maximum of y repetitions
- | = alternatives
- + = 1 or more of previous character
- ? after +, *, or {} indicates non-greedy behavior, i.e. match the fewest characters, not the most
- a-b = range of characters from a to b, e.g. "t-w" means any of t,u,v,w in that position
- ?= = lookahead (need explanation of how this works) a(?=b) returns "a, but only if it's followed by a b"; the a becomes part of the matched sequence, but the b does not
- ?<= = reverse lookahead (need explanation of how this works)
Operators used to invoke regex:
- =~ returns TRUE if pattern matches
- !~ returns FALSE if pattern matches
- s/pattern/replacement/gi; replaces pattern with replacement
- g (global) means repeat the pattern search until there are no more matches
- i (insensitive) means alphabetic matches are checked case-insensitively
- y/searchlist/replacelist/d: replaces each character found in searchlist with the corresponding character in replacelist
- d just deletes matching characters
- tr/ is the same as y/
Examples
These examples have been tested only briefly.
- Replace "thingy" with "stuffs" in $string:
- $string =~ s/thingy/stuffs/;
- Keep only the part of $string before the final "/" (using "|" as the delimiter instead of "/"):
- $string =~ s|(.*)/[^/]*|$1|;
- ...after the final "/":
- $string =~ s| ^.*/([^/]*)$|$1|;
- ...before the final "-":
- $string =~ s|(.*)-[^-]*|$1|;
- ...before the final ".":
- $string =~ s|(.*)\.[^\.]*|$1|;
- ...after the final "." (both of these return the full string if no "." is found):
- $string =~ s|^.+\.(.+$)|$1|;
- $string =~ s|^.*\.([^\.]*)$|$1|;