Online
 
Friday, 09 January 2009
 
 
More article:
Related Content:

Your Ad Here

Limitations of the Basic Syntax
 
Article Index
Limitations of the Basic Syntax
Page 2
Page 3
Page 4
 

.+@.+\..+

can be used to indicate:

 

At least one instance of any character, followed by

 

The "@" character, followed by

 

At least one instance of any character, followed by

 

The "." character, followed by

 

At least one instance of any character.

As you might have guessed, this expression is a very rough form of email address validation. Note how I have used the backslash character (\) to force the regex compiler to interpret the penultimate "." as a literal character, rather than as another instance of the "any character" regular expression.

However, that is a rather primitive way of checking for the validity of an email address. After all, only letters of the alphabet, the underscore character (_), the minus character (), and digits are allowed in the name, domain, and extension portion of an email. This is where the range denominators come into play.

As mentioned previously, anything within nonescaped square brackets represents a set of alternatives for a particular character position. For example, [abc] indicates either an "a", a "b", or a "c". However, representing something like "any character" by including every possible symbol in the square brackets would give birth to some ridiculously long regular expressionsand regex are complex enough as it is.

Luckily, it's possible to specify a "range" of characters by separating them with a dash. For example, [a-z] means "any lowercase character." You can also specify more than one range and combine them with individual characters by placing them side-by-side. For example, our email validation requirements can be satisfied by the expression [A-Za-z0-9_], which turns the overall regex into

[A-Za-z0-9_]+@[A-Za-z0-9_]+\.[A-Za-z0-9_]+

The range specifications that we have seen so far are all inclusivethat is, they tell the regex compiler which characters can be in the string. Sometimes, it's more convenient to use exclusive specifications, dictating that any character except the characters you specify are valid. This can be done by prepending a caret character (^) to the character specifications inside the square bracket. For example, [^A-Z] means "any character except any uppercase letter of the alphabet."

Going back to the email validation regex, it's still not as good as it could be. For example, we know for sure that a domain extension (for example, .ca or .com) must have a minimum of two characters (as in .ca) and a maximum of four (as in .info). We can therefore use the minimum-maximum length specifier that I introduced earlier to specify this additional requirement:


Tags: Add more tags...,
This entry was posted on . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a comment. Tags: Simple PHP, Pear, Easy PHP, PHP Tutorial, PHP MySQL, XSLT, Sap Tutorial, CSS Tutorial, XSL FO Java, SQL Tutorial.
Users' Comments (0)

Comment an article
  Name
  E-mail
   Title
Available characters: 4000
 Notify me of follow-up comments
This image contains a scrambled text, it is using a combination of colors, font size, background, angle in order to disallow computer to automate reading. You will have to reproduce it to post on my homepage
Enter what you see:

No comment posted

Mobile Wallpaper 46
Statistic


Last Post

 
Top! Top!