Welcome!

Kurt Cagle

Subscribe to Kurt Cagle: eMailAlertsEmail Alerts
Get Kurt Cagle via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: RIA Developer's Journal, AJAX World RIA Conference

RIA & Ajax: Article

Real-World AJAX Book Preview: Getting Expressive with Regular Expressions

Real-World AJAX Book Preview: Getting Expressive with Regular Expressions

This content is reprinted from Real-World AJAX: Secrets of the Masters published by SYS-CON Books. To order the entire book now along with companion DVDs for the special pre-order price, click here for more information. Aimed at everyone from enterprise developers to self-taught scripters, Real-World AJAX: Secrets of the Masters is the perfect book for anyone who wants to start developing AJAX applications.

Getting Expressive with Regular Expressions
Regular expressions (or Regexes, as they are sometimes called) provide a way of defining text patterns that can be used for validation, testing, and string replacement. The Regex language has expanded considerably over the years, providing a remarkably rich and robust set of tools for parsing content and building new content, something that comes in handy when dealing with AJAX-based systems.

In JavaScript, regular expressions are core objects just like strings and arrays and can be defined using either a specific object (in this case the RegExp() object), or by using the forward slash delimiters // (just as [] designates an array and "" designates a string). Thus, a regular expression matching the string sequence 'test' could be declared as:

var retest = new RegExp('test');
var retest = /test/;

Note: You should be careful to differentiate between the forward slash containers used in regexes and the comment delimiter //. The expression

retest = //

is not a commented-out statement but an empty regular expression.

Regular expressions consist of two parts:

  • Pattern. The pattern is the sequence of characters that identifies the regular expression.
  • Flags. The flags consist of three distinct character indicators that determine the scope of the regex:
    - Global (g): The global flag indicates that the regular expression should be applied to all potential matches in a string rather than just the first. If the global flag is false, only the first occurrence of a regular expression will be returned.
    - Ignore Case (i): This flag indicates that the regular expression should be applied to either upper-or lower-case alphabetic characters indiscriminately. If the ignoreCase flag is false, the regular expression will explicitly match only those terms that have the same case.
    - Multiline (m): Normally the regular rxpression automatically stops at the end of a line designated with a carriage return or new line character. If the multi-line flag is set to true, the match will ignore such characters and continue to match past line boundaries.
You can set these patterns in turn in one of three ways – either by putting the flags in the Regular Expression after the second forward slash, setting it as the second argument of the RegExp() constructor, or setting it via one of the flag properties. For instance, to create a regular expression that will search through an entire file for all instances of the word "test" in any permutation ("TEST," "Test," "test," etc.), your regular expression would look like:

var reTest = /test/gmi;

or

var reTest = new RegExp("test","gmi");

or

var reTest = new RegExp("test");
reTest.global = true;
reTest.multiline = true;
reTest.ignoreCase = true;

The simplest operation that a regular expression can be used with is the test() method. This method, on the regex, compares the string argument passed to it with the regular expression and determines whether or not the pattern is matched. For instance:

var reTest=/test/i;
print(reTest.test("Testament"));
=> true

Beyond test(), the next most useful regular expression command is actually located on the String() object – the replace() method. This particular method uses the string it's attached to as its base and a Regular Expression argument to find a set of matches, then replaces matches with the second argument.

For instance, suppose you wanted to suppress the appearance of all numbers in a credit card sequence and replace them with asterisk characters. You could use the following commands:

cc = "123-456-789";
reNum=/[0-9]/g;
print(cc.replace(reNum,"*"));
=> ***-***-***

Note that unlike arrays, the replace method doesn't alter the string, but rather creates a new string as a result (that is to say, the value in the variable cc remains the same).

The notation [0-9] indicates one of many different abbreviations that make regexes at least notionally easier to work with. In this particular case, it indicates a match of any character in the range of 0 to 9, i.e., any numeric digit. If you wanted to indicate all alphanumeric characteristics you'd set up three ranges – [0-9A-Za-z]. You could also use the pipe "|" character to indicate alternatives:

(0|1|2|3|4|5|6|7|8|9)

But obviously this is going to be more cumbersome. The pipe does come in handy, however, when you're trying to provide a range of potential values to be used for validation, such as a range of colors:

reColors = /^(red|blue|green|yellow|orange|purple|black|white)$/;
color="red";
print(reColors.test(color));
=> true;
color="gold";
print(reColors.test(color));
=> false;

The two characters caret "^" and dollar "$" indicate that the regular expression should be valid from the start of the search range (the first character) to the end of the search range (the last character). Without them, the regular expression would return true if the target sequence was found anywhere in the source string. Thus,

reColors1 = /^red$/;
color="red";
print(reColors1.test(color));
=> true;
color="barred";
print(reColors.test(color));
=> false;
reColors1 = /red/;
color="red";
print(reColors1.test(color));
=> true;
color="barred";
print(reColors.test(color));
=> true;

There are numerous other specialized characters that are used with regular expressions. As with strings, these character sequences are indicated with an escaping backslash, and for the most part correspond to string notation (see Table 2.2).

In general, if a character has a specialized meaning in a Regular Expression, escaping it will cause the character itself to be represented instead, such as a \( indicating a parentheses character rather than the start of an expression).

In addition to these characters, the regular expression library includes a number of operators to determine existence, repetition, and negation, as given in Table 2.3.

For instance, let's say you want to ensure that a given content block was a credit card of the form 123-456-789. You could use a regular expression with the abbreviated forms to check not only the boundaries but the repetitions:

var cc = "123-456-789";
var reCC = /^\d{3}-\d{3}-\d{3}$/;
print(reCC.test(cc));
      => true

Postal codes are a little more complex, especially if you want to include both American and Canadian/British codes. If you have to check both in the same field, the regex might look something like:

var rePostalCode = /^\d{5}(-\d{4})?$|^[a-z]\d[a-z](\-|\s)?\d[a-z]\d$/i;

This rather cryptic string can be broken down fairly handily into several component parts, as shown in Table 2.4:

While you can do straight validations with regular expressions (especially useful for forms processing), regexes are actually more powerful when combined with the String().replace() method. While replace() normally takes a string as the first argument as a replacement target, if a regular expression is supplied, you can take advantage of the considerably richer capabilities to do some nearly magical effects.

For instance, suppose you wanted to replace everything that looks like it might be an e-mail address with a mailto: link. You can use regexes to solve this problem quite easily:

msg = "For more information, please contact Kurt Cagle at kurt.cagle@gmail.com or
Tom Generic at generic@generic.com."
reAtMail = /((?:[A-Z]\w+\s?)+)at\s((?:\w+[._-])*\w+@(?:\w+\.)*\w+)/gi;
linkedMsg = msg.replace(reAtMail,'<a href="mailto:$2">$1</a>')
=> For more information, please contact <a href="mailto:kurt.cagle@gmail.com">Kurt
Cagle </a> or <a href="mailto:generic@generic.com">Tom Generic </a>.

This particular regular expression looks for the pattern "Name Name at username@server" and rewrites it as <a href="mailto:username@server">Name Name</a>. This illustrates both matching groups (anything in parentheses) and non-matching groups (?:anything in parentheses starting with ?:). Internally, each matching group gets saved in a variable $1,$2,$3, and the replace() method's second parameter can then reference these as part of a string template to insert the matched text back into the resulting string.

Regular expressions are incredibly powerful for parsing and converting both text- and XMLbased content and should be considered an indispensable part of any AJAX-based toolkit. Indeed, especially in validation types of applications, you can actually create libraries of commonly used regexes consolidated as a single object, such as:

var RegexLib = {
reMail: /((?:\w+[._-])*\w+@(?:\w+\.)*\w+)/g,
reAtMail: /((?:[A-Z]\w+\s?)+)at\s((?:\w+[._-])*\w+@(?:\w+\.)*\w+)/g,
reDoubleQuote: /"([^"]*)"/g,
reSingleQuote : /'([^']*)'/g,

}

msg = "For more information, please contact Kurt Cagle at kurt.cagle@gmail.com or
Tom Generic at generic@generic.com.";
msg.replace(RegexLib.reAtMail,"<a href='mailto:$2'>$1</a>");
For more information, <a href='mailto:kurt.cagle@gmail.com'>please contact Kurt
Cagle </a> <a href='mailto:generic@generic.com'>or Tom Generic </a>.

This content is reprinted from Real-World AJAX: Secrets of the Masters published by SYS-CON Books. To order the entire book now along with companion DVDs, click here to order.

More Stories By Kurt Cagle

Kurt Cagle is a developer and author, with nearly 20 books to his name and several dozen articles. He writes about Web technologies, open source, Java, and .NET programming issues. He has also worked with Microsoft and others to develop white papers on these technologies. He is the owner of Cagle Communications and a co-author of Real-World AJAX: Secrets of the Masters (SYS-CON books, 2006).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.