Difference between revisions of "Perl regex"

from HTYP, the free directory anyone can edit if they can prove to me that they're not a spambot
Jump to navigation Jump to search
(→‎Details: more stuff, and how to match everything-including-newline)
(→‎Links: PHP PCRE functions)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Computing]]: [[Programming]]: [[Perl]]: [[Perl regex|regex]]
+
=={{hide|Navbar}}==
 
+
[[computing]]: [[programming]]: [[Perl]]: [[Perl regex|regex]]
 +
==Overview==
 
This article explains regular expressions in terms understandable to mere mortals, and also how to use them in Perl.
 
This article explains regular expressions in terms understandable to mere mortals, and also how to use them in Perl.
 
==Related Articles==
 
==Related Articles==
 
*{{manpagelink|regex}}: manpage documentation
 
*{{manpagelink|regex}}: manpage documentation
 
 
==Details==
 
==Details==
 
Special characters in regex:
 
Special characters in regex:
Line 19: Line 19:
 
* '''+''' = 1 or more of previous character
 
* '''+''' = 1 or more of previous character
 
* '''?''' after '''+''', '''*''', or '''{}''' indicates non-greedy behavior, i.e. match the fewest characters, not the most
 
* '''?''' after '''+''', '''*''', or '''{}''' indicates non-greedy behavior, i.e. match the fewest characters, not the most
 +
** Apparently it can also mean "either zero or one of the preceding group or character", e.g. "the?" matches both "th" and "the"
 
* <u>a</u>'''-'''<u>b</u> = range of characters from <u>a</u> to <u>b</u>, e.g. "t-w" means any of t,u,v,w in that position
 
* <u>a</u>'''-'''<u>b</u> = range of characters from <u>a</u> to <u>b</u>, e.g. "t-w" means any of t,u,v,w in that position
 
* '''?=''' = lookahead (need explanation of how this works) <u>a</u>'''(?='''<u>b</u>''')''' returns "<u>a</u>, but only if it's followed by a <u>b</u>"; the <u>a</u> becomes part of the matched sequence, but the <u>b</u> does not
 
* '''?=''' = lookahead (need explanation of how this works) <u>a</u>'''(?='''<u>b</u>''')''' returns "<u>a</u>, but only if it's followed by a <u>b</u>"; the <u>a</u> becomes part of the matched sequence, but the <u>b</u> does not
Line 56: Line 57:
 
** $string =~ s|^.+\.(.+$)|$1|;
 
** $string =~ s|^.+\.(.+$)|$1|;
 
** $string =~ s|^.*\.([^\.]*)$|$1|;
 
** $string =~ s|^.*\.([^\.]*)$|$1|;
 +
==Links==
 +
===Reference===
 +
* [http://php.net/pcre Regular Expressions (Perl-Compatible)] in [[PHP]]
 +
** PHP also supports [http://php.net/regex POSIX regex] via the [http://php.net/ereg ereg*()] series of functions, but these are apparently deprecated; to convert code from ereg* to PCRE, just add slashes around the string: /<u>regex expression</u>/
 +
===Articles===
 +
* '''2007-10-24''' [http://pugs.blogs.com/pugs/2007/10/a-graphical-tra.html A graphical tracer for Perl 6 regexes based on PCR]: code for a graphical regex tracer written in [[Perl 6]], with links to some demo sites

Latest revision as of 13:34, 16 July 2009

Navbar

computing: programming: Perl: regex

Overview

This article explains regular expressions in terms understandable to mere mortals, and also how to use them in Perl.

Related Articles

  • regex: manpage documentation

Details

Special characters in regex:

  • . = any character except newline (\n) (if /s option is included, then \n is also matched)
  • * = 0 or more of previous character
  • ^ = following string begins the line (except [^...] means "not these characters")
  • $ = preceding string ends the line
  • [] = list of characters which can satisfy the match at this position
    • Note: cannot use special characters like "." as part of list
  • {} = # of repetitions of previous character:
    • {x} -> exactly x repetitions
    • {x,y} -> minimum of x repetitions, maximum of y repetitions
  • | = alternatives
  • + = 1 or more of previous character
  • ? after +, *, or {} indicates non-greedy behavior, i.e. match the fewest characters, not the most
    • Apparently it can also mean "either zero or one of the preceding group or character", e.g. "the?" matches both "th" and "the"
  • a-b = range of characters from a to b, e.g. "t-w" means any of t,u,v,w in that position
  • ?= = lookahead (need explanation of how this works) a(?=b) returns "a, but only if it's followed by a b"; the a becomes part of the matched sequence, but the b does not
  • ?<= = reverse lookahead (need explanation of how this works)

Operators used to invoke regex:

  • =~ returns TRUE if pattern matches
  • !~ returns FALSE if pattern matches
  • m/ searches a string for a pattern match; returns true/false scalar and an array of matches (if () are used)
    • c don't reset pos on failed matches when using /g
    • g (Global) all occurrences – repeat the pattern search until there are no more matches
    • i case-Insensitive
    • m Multiline mode - ^ and $ match internal lines
    • o compile pattern Once
    • s match as a Single line - . (dot) matches \n
    • x eXtended legibility - free whitespace and comments
  • s/pattern/replacement/gi; replaces pattern with replacement
    • all m/ options are available
    • e Evaluate replacement as an expression. May be specified multiple times. 'replacement' is interpreted as a double quoted string unless a single-quote (') is the delimiter.
  • y/searchlist/replacelist/d: replaces each character found in searchlist with the corresponding character in replacelist
    • d just deletes matching characters
  • tr/ is the same as y/

Examples

These examples have been tested only briefly.

  • Replace "thingy" with "stuffs" in $string:
    • $string =~ s/thingy/stuffs/;
  • Keep only the part of $string before the final "/" (using "|" as the delimiter instead of "/"):
    • $string =~ s|(.*)/[^/]*|$1|;
  • ...after the final "/":
    • $string =~ s| ^.*/([^/]*)$|$1|;
  • ...before the final "-":
    • $string =~ s|(.*)-[^-]*|$1|;
  • ...before the final ".":
    • $string =~ s|(.*)\.[^\.]*|$1|;
  • ...after the final "." (both of these return the full string if no "." is found):
    • $string =~ s|^.+\.(.+$)|$1|;
    • $string =~ s|^.*\.([^\.]*)$|$1|;

Links

Reference

Articles