| Limitations of the Basic Syntax |
| Article Index |
|---|
| Limitations of the Basic Syntax |
| Page 2 |
| Page 3 |
| Page 4 |
.+@.+\..+
can be used to indicate:
At least one instance of any character, followed by
The "@" character, followed by
At least one instance of any character, followed by
The "." character, followed by
At least one instance of any character.
As you might have guessed, this expression is a very rough form of email address validation. Note how I have used the backslash character (\) to force the regex compiler to interpret the penultimate "." as a literal character, rather than as another instance of the "any character" regular expression.
However, that is a rather primitive way of checking for the validity of an email address. After all, only letters of the alphabet, the underscore character (_), the minus character (), and digits are allowed in the name, domain, and extension portion of an email. This is where the range denominators come into play.
As mentioned previously, anything within nonescaped square brackets represents a set of alternatives for a particular character position. For example, [abc] indicates either an "a", a "b", or a "c". However, representing something like "any character" by including every possible symbol in the square brackets would give birth to some ridiculously long regular expressionsand regex are complex enough as it is.
Luckily, it's possible to specify a "range" of characters by separating them with a dash. For example, [a-z] means "any lowercase character." You can also specify more than one range and combine them with individual characters by placing them side-by-side. For example, our email validation requirements can be satisfied by the expression [A-Za-z0-9_], which turns the overall regex into
[A-Za-z0-9_]+@[A-Za-z0-9_]+\.[A-Za-z0-9_]+
The range specifications that we have seen so far are all inclusivethat is, they tell the regex compiler which characters can be in the string. Sometimes, it's more convenient to use exclusive specifications, dictating that any character except the characters you specify are valid. This can be done by prepending a caret character (^) to the character specifications inside the square bracket. For example, [^A-Z] means "any character except any uppercase letter of the alphabet."
Going back to the email validation regex, it's still not as good as it could be. For example, we know for sure that a domain extension (for example, .ca or .com) must have a minimum of two characters (as in .ca) and a maximum of four (as in .info). We can therefore use the minimum-maximum length specifier that I introduced earlier to specify this additional requirement:
| Users' Comments (0) |
|
No comment posted








