Spec-Zone .ru
спецификации, руководства, описания, API
|
Table 12.9. String Regular Expression Operators
Name | Description |
---|---|
NOT
REGEXP |
Negation of REGEXP |
REGEXP |
Pattern matching using regular expressions |
RLIKE |
Synonym for REGEXP |
A regular expression is a powerful way of specifying a pattern for a complex search.
MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX
1003.2. MySQL uses the extended version to support pattern-matching operations performed with the REGEXP
operator in
SQL statements.
This section summarizes, with examples, the special characters and constructs that can be used in MySQL for REGEXP
operations. It
does not contain all the details that can be found in Henry Spencer's regex(7)
manual page. That manual page is included in MySQL source distributions, in the regex.7
file under the regex
directory. See also
Section 3.3.4.7,
"Pattern Matching".
,
expr
NOT REGEXP pat
expr
NOT RLIKE pat
This is the same as NOT (
. expr
REGEXP
pat
)
,
expr
REGEXP pat
expr
RLIKE pat
Performs a pattern match of a string expression expr
against a pattern pat
. The pattern can be an extended
regular expression. The syntax for regular expressions is discussed in Section
12.5.2, "Regular Expressions". Returns 1
if expr
matches pat
;
otherwise it returns 0
. If either expr
or pat
is NULL
, the result is NULL
. RLIKE
is
a synonym for REGEXP
, provided for mSQL
compatibility.
The pattern need not be a literal string. For example, it can be specified as a string expression or table column.
Because MySQL uses the C escape syntax in strings (for example, "\n
" to
represent the newline character), you must double any "\
" that you use in your REGEXP
strings.
REGEXP
is not
case sensitive, except when used with binary strings.
mysql>SELECT 'Monty!' REGEXP '.*';
-> 1mysql>SELECT 'new*\n*line' REGEXP 'new\\*.\\*line';
-> 1mysql>SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A';
-> 1 0mysql>SELECT 'a' REGEXP '^[a-d]';
-> 1
REGEXP
and RLIKE
use
the character set and collations of the arguments when deciding the type of a character and
performing the comparison. If the arguments have different character sets or collations,
coercibility rules apply as described in Section
10.1.7.5, "Collation of Expressions".
The REGEXP
and RLIKE
operators work in byte-wise fashion, so they are not
multi-byte safe and may produce unexpected results with multi-byte character sets. In addition,
these operators compare characters by their byte values and accented characters may not compare
as equal even if a given collation treats them as equal.
A regular expression describes a set of strings. The simplest regular expression is one that has no special
characters in it. For example, the regular expression hello
matches hello
and nothing else.
Nontrivial regular expressions use certain special constructs so that they can match more than one string. For
example, the regular expression hello|word
matches either the string hello
or the string word
.
As a more complex example, the regular expression B[an]*s
matches any of the
strings Bananas
, Baaaaas
, Bs
, and any other string starting with a B
, ending
with an s
, and containing any number of a
or n
characters in between.
A regular expression for the REGEXP
operator may use any of the following special characters and constructs:
^
Match the beginning of a string.
mysql>SELECT 'fo\nfo' REGEXP '^fo$';
-> 0mysql>SELECT 'fofo' REGEXP '^fo';
-> 1
$
Match the end of a string.
mysql>SELECT 'fo\no' REGEXP '^fo\no$';
-> 1mysql>SELECT 'fo\no' REGEXP '^fo$';
-> 0
.
Match any character (including carriage return and newline).
mysql>SELECT 'fofo' REGEXP '^f.*$';
-> 1mysql>SELECT 'fo\r\nfo' REGEXP '^f.*$';
-> 1
a*
Match any sequence of zero or more a
characters.
mysql>SELECT 'Ban' REGEXP '^Ba*n';
-> 1mysql>SELECT 'Baaan' REGEXP '^Ba*n';
-> 1mysql>SELECT 'Bn' REGEXP '^Ba*n';
-> 1
a+
Match any sequence of one or more a
characters.
mysql>SELECT 'Ban' REGEXP '^Ba+n';
-> 1mysql>SELECT 'Bn' REGEXP '^Ba+n';
-> 0
a?
Match either zero or one a
character.
mysql>SELECT 'Bn' REGEXP '^Ba?n';
-> 1mysql>SELECT 'Ban' REGEXP '^Ba?n';
-> 1mysql>SELECT 'Baan' REGEXP '^Ba?n';
-> 0
de|abc
Match either of the sequences de
or abc
.
mysql>SELECT 'pi' REGEXP 'pi|apa';
-> 1mysql>SELECT 'axe' REGEXP 'pi|apa';
-> 0mysql>SELECT 'apa' REGEXP 'pi|apa';
-> 1mysql>SELECT 'apa' REGEXP '^(pi|apa)$';
-> 1mysql>SELECT 'pi' REGEXP '^(pi|apa)$';
-> 1mysql>SELECT 'pix' REGEXP '^(pi|apa)$';
-> 0
(abc)*
Match zero or more instances of the sequence abc
.
mysql>SELECT 'pi' REGEXP '^(pi)*$';
-> 1mysql>SELECT 'pip' REGEXP '^(pi)*$';
-> 0mysql>SELECT 'pipi' REGEXP '^(pi)*$';
-> 1
{1}
, {2,3}
{n}
or {m,n}
notation provides a more
general way of writing regular expressions that match many occurrences of the previous atom (or
"piece") of the pattern. m
and n
are integers.
a*
Can be written as a{0,}
.
a+
Can be written as a{1,}
.
a?
Can be written as a{0,1}
.
To be more precise, a{n}
matches exactly n
instances of a
. a{n,}
matches n
or more instances of a
. a{m,n}
matches m
through n
instances of a
, inclusive.
m
and n
must be in the range from 0
to RE_DUP_MAX
(default 255), inclusive. If
both m
and n
are given, m
must be less than or equal to n
.
mysql>SELECT 'abcde' REGEXP 'a[bcd]{2}e';
-> 0mysql>SELECT 'abcde' REGEXP 'a[bcd]{3}e';
-> 1mysql>SELECT 'abcde' REGEXP 'a[bcd]{1,10}e';
-> 1
[a-dX]
, [^a-dX]
Matches any character that is (or is not, if ^ is used) either a
, b
, c
, d
or
X
. A -
character between two other
characters forms a range that matches all characters from the first character to the second. For
example, [0-9]
matches any decimal digit. To include a literal ]
character, it must immediately follow the opening bracket [
. To include a literal -
character, it
must be written first or last. Any character that does not have a defined special meaning inside a
[]
pair matches only itself.
mysql>SELECT 'aXbc' REGEXP '[a-dXYZ]';
-> 1mysql>SELECT 'aXbc' REGEXP '^[a-dXYZ]$';
-> 0mysql>SELECT 'aXbc' REGEXP '^[a-dXYZ]+$';
-> 1mysql>SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$';
-> 0mysql>SELECT 'gheis' REGEXP '^[^a-dXYZ]+$';
-> 1mysql>SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$';
-> 0
[.characters.]
Within a bracket expression (written using [
and ]
),
matches the sequence of characters of that collating element. characters
is either a single character or a character name like
newline
. The following table lists the permissible character names.
The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.
Name | Character | Name | Character |
---|---|---|---|
NUL |
0 |
SOH |
001 |
STX |
002 |
ETX |
003 |
EOT |
004 |
ENQ |
005 |
ACK |
006 |
BEL |
007 |
alert |
007 |
BS |
010 |
backspace |
'\b' |
HT |
011 |
tab |
'\t' |
LF |
012 |
newline |
'\n' |
VT |
013 |
vertical-tab |
'\v' |
FF |
014 |
form-feed |
'\f' |
CR |
015 |
carriage-return |
'\r' |
SO |
016 |
SI |
017 |
DLE |
020 |
DC1 |
021 |
DC2 |
022 |
DC3 |
023 |
DC4 |
024 |
NAK |
025 |
SYN |
026 |
ETB |
027 |
CAN |
030 |
EM |
031 |
SUB |
032 |
ESC |
033 |
IS4 |
034 |
FS |
034 |
IS3 |
035 |
GS |
035 |
IS2 |
036 |
RS |
036 |
IS1 |
037 |
US |
037 |
space |
' ' |
exclamation-mark |
'!' |
quotation-mark |
'"' |
number-sign |
'#' |
dollar-sign |
'$' |
percent-sign |
'%' |
ampersand |
'&' |
apostrophe |
'\'' |
left-parenthesis |
'(' |
right-parenthesis |
')' |
asterisk |
'*' |
plus-sign |
'+' |
comma |
',' |
hyphen |
'-' |
hyphen-minus |
'-' |
period |
'.' |
full-stop |
'.' |
slash |
'/' |
solidus |
'/' |
zero |
'0' |
one |
'1' |
two |
'2' |
three |
'3' |
four |
'4' |
five |
'5' |
six |
'6' |
seven |
'7' |
eight |
'8' |
nine |
'9' |
colon |
':' |
semicolon |
';' |
less-than-sign |
'<' |
equals-sign |
'=' |
greater-than-sign |
'>' |
question-mark |
'?' |
commercial-at |
'@' |
left-square-bracket |
'[' |
backslash |
'\\' |
reverse-solidus |
'\\' |
right-square-bracket |
']' |
circumflex |
'^' |
circumflex-accent |
'^' |
underscore |
'_' |
low-line |
'_' |
grave-accent |
'`' |
left-brace |
'{' |
left-curly-bracket |
'{' |
vertical-line |
'|' |
right-brace |
'}' |
right-curly-bracket |
'}' |
tilde |
'~' |
DEL |
177 |
mysql>SELECT '~' REGEXP '[[.~.]]';
-> 1mysql>SELECT '~' REGEXP '[[.tilde.]]';
-> 1
[=character_class=]
Within a bracket expression (written using [
and ]
),
[=character_class=]
represents an equivalence class. It matches all
characters with the same collation value, including itself. For example, if o
and (+)
are the members of an
equivalence class, [[=o=]]
, [[=(+)=]]
, and
[o(+)]
are all synonymous. An equivalence class may not be used as an
endpoint of a range.
[:character_class:]
Within a bracket expression (written using [
and ]
),
[:character_class:]
represents a character class that matches all
characters belonging to that class. The following table lists the standard class names. These names
stand for the character classes defined in the ctype(3)
manual page. A
particular locale may provide other class names. A character class may not be used as an endpoint of
a range.
Character Class Name | Meaning |
---|---|
alnum |
Alphanumeric characters |
alpha |
Alphabetic characters |
blank |
Whitespace characters |
cntrl |
Control characters |
digit |
Digit characters |
graph |
Graphic characters |
lower |
Lowercase alphabetic characters |
print |
Graphic or space characters |
punct |
Punctuation characters |
space |
Space, tab, newline, and carriage return |
upper |
Uppercase alphabetic characters |
xdigit |
Hexadecimal digit characters |
mysql>SELECT 'justalnums' REGEXP '[[:alnum:]]+';
-> 1mysql>SELECT '!!' REGEXP '[[:alnum:]]+';
-> 0
[[:<:]]
, [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A
word is a sequence of word characters that is not preceded by or followed by word characters. A word
character is an alphanumeric character in the alnum
class or an
underscore (_
).
mysql>SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';
-> 1mysql>SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';
-> 0
To use a literal instance of a special character in a regular expression, precede it by two backslash (\)
characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets
the other. For example, to match the string 1+2
that contains the special +
character, only the last of the following regular expressions is the correct
one:
mysql>SELECT '1+2' REGEXP '1+2';
-> 0mysql>SELECT '1+2' REGEXP '1\+2';
-> 0mysql>SELECT '1+2' REGEXP '1\\+2';
-> 1