Flash / Regular expressions class for Flash 5 / Flash MX

RegExp class for Flash 5 / Flash MX

I have been doing a lot of programming with JavaScript and PHP, and using regular expressions extensively. They are very handy in situations where one has to check for validity of user input, or some complex string manipulation such as URL-wrapping all Email addresses within the given plain text. All this and much more can be done with couple of lines when you use regular expressions. Please believe me, when I say that regular expressions may some day save hundreds of hours of coding, and will make your code more readable and efficient. Much of stuff is written by many authors, so I won't repeat them here. If you are interested in regular expressions, go to some online source. Here's a couple of them for start:

http://www.webreference.com/js/column5/
http://developer.netscape.com/docs/manuals/js/client/jsguide/regexp.htm
http://www.devshed.com/Server_Side/Administration/RegExp

So, when I started to dig through Flash5 ActionScript, to my disappointment I found no RegExp object there. Asking Macromedia people about why this is so I've got response that the size of Flash player would be too large with this functionality. Well, that's too bad. I am not sure though if it's true. Pure code should not be more than 10Kb. Is that a lot? Maybe. People of Macromedia have great pressures when deciding which features make it to the player binary code, because the plugin file size is a scarce resource. When Flash MX and Flash 6 player was introduced, the RegExp engine still was not there. An now, when Flash MX 2004 with the new Flash 7 player is here, we still don't have this industry standard present. However, a new product, Macromedia Central, has a special player that has RegExp support built in. While that is a good sign, the penetration rate of this player is not expected to be very high in the public internet.

Anyway, the lack of RegExp support was reason why I decided to write a class to enable regular expressions in ActionScript. After a week of serious coding, the result is here. It is almost 100% compatible with JavaScript 1.2 standard, except for minor syntax issue: Flash ActionScript syntax does not allow to use forward slashes upon regular expression assignment. To be sincere, my ActionScript class is not the first one. To my surprise, the forerunner with publicising his code couple of days ahead was some guy Andy Black with his version of RegExp object. Since I wanted to come public with finished code and be as much close to JavaScript syntax as possible, I hesitated about it. However, applause to Andy, he didn't let me sleep on laurels. Let you be the judge which implementation to use.

Flash MX 2004

Flash MX 2004 comes with completely upgraded OOP ActionScript syntax, making it more like Java. That requires a new enigne AS code, that is doomed to have some compatibilty difficulties, since the Flash ActionScript compiler has got smarter and does not let to use new string object methods that are added at runtime. I still am consulting with MM people how to overcome there compatibility issues. There are two or three classes somewhere in the air, that are made by eager coomunity memers. Those are based completely on the current Flash 5/MX code, wrapped in the new OOP syntax. There are several problems with these classes, and I don't feel like putting them here for distribution, as it would induce proliferation of non-standard code. So, I have put the developement of my version of MX 2004 code until I can get some cooperation from the Macromedia side.

Meanwhile, to answer all your requests for "at least any" support for MX 2004, I provide one class for download. It is made by Joey Lott, it is pure encapsulation of the original Flash 5/MX into the new AS2 syntax. I am afraid I can not provide support, if some questions arise, though send them to me and will sort that out.
Here is the ZIP with RegExp for Flash MX 2004 AS file: RegExp_JLott.zip

Books about RegExp in Flash, or regular expressions in general

Here's an assortment of several books, that cover the use of this class in Flash application development, and also highly valuable book about regexes in general:

Downloads

RegExp.as - Single class file
RegExp.zip - Class file with HTML shell and FLA for testing and exploration of regular expressions. Includes AS panel customization file to allow comfortable use of RegExp in Flash MX.

If you make some project with this class, and this project is available online, please send me a link so I can have a link to it on this page. I think it is important to make Flash people more acquainted with the power of regular expressions.
If you trap a bug, please let me know sending an Email to pavils@mailbox.riga.lv. However, please check if your problem is not answered in FAQ section before.
Also, if you are a developer willing to contribute to this project, mail me.

License

This class is provided for flash community for free with a kind request to keep the copyright lines in AS file untouched. However, debugging and development of class takes much time limiting my opportunities to earn some income on other projects. To overcome this, I have set up an account with PayPal. Please, if you find my work valuable, especially if you use it in commercial projects, make a donation to pavils@mailbox.riga.lv of amount you feel is right. Please provide your E-mail address upon payment submission so I can enlist you in my upgrade newslist. If you decide to extend the class, or to base your distributable projects by using this class, try to separate the extension code from the original, leaving the original class code unchanged. Also, make sure that the ECMAscript standard is met. Do not host this class on your site, but provide a link to this page instead, as I may put bug-fixed and updated versions here. Please inform me on your intentions to include this class in some other projects so we can decide which distribution framework is best.

A little request

Please keep reporting me how exactly do you use this class in your projects. Please, don't be lazy. I would be happy to compile a regex case studies from your reports.

Brief documentation and notes

Constructor

RegExp("pattern",["g|i|m"]) Creates regular expression object instance

Methods

re.compile("pattern",["g|i|m"]) recompiles the given expression
re.test("string") returns true if match is found in string, false if none can be found
re.exec("string") returns object characterizing latest match, null if none can be found
String.match(regexp) returns array of matches, null if none can be found
String.replace(regexp,replace) returns string where characters matching given regular expression are replaced with replace string
String.search(regexp) Returns index of position where given regular expression is found, -1 if none can be found
String.split(regexp) splits string using regular expression as delimiter and returns array with elements

Properties

RegExp.d R/W Debug mode on/off
re.global R Global search flag.
re.ignoreCase R Case sensitivity flag
re.lastIndex R/W The index of last match
re.multiline R true, if expression matching at the start or end of line is enabled
re.source R Regular expression pattern source
RegExp.lastMatch R Last matched string
RegExp.lastParen R Last matched parenthesized component
RegExp.leftContext R String to the left from the last match
RegExp.rightContext R String to the right from the last match
RegExp.$1..$9 R First nine parenthesized component matches

Supported features

recognition of "i", "g" and "m" flags
*, ?, + counting metacharacters
counting metacharacters with "{}" brackets: {n} {n,} {n,m}
character set with [...]. The "]" character can be included in set, if escaped with "\". Character range can be defined, using "-" as in "[0-9A-Z]". To include "-" in character set, escape it with "\" or start with it the definition: "[-A-Z]"
negated character set with [^...]
predefined character sets \d, \D, \s, \S, \w, \W and "."
\b and \B word boundary support
"OR" support with pipe "|" like in "Yes|No|Whatever"
^ and $ position metacharacters are recognized. Must be located at the beginning/end of expression or first-level "OR" section. Examples follow: "^ABC$" - valid; "^ABC|DEF$|^GHI$" - valid
recognition of lastIndex property in consecutive searches when global flag is enabled
character grouping in parenthesis like in "M(iss)+ipi"
back referencing to previous matching parenthesis with \n (where n is a positive integer)
use of $1..$9 in String.replace()

Frequently Asked Questions:

Q: I think your engine has a bug.
A: I will appreciate any bug report and will write it down in my to-do list. However, please note that "bug" is actually there if, when testing the same regular expression with the same string in testing console you get different results in Flash and Browser properties and/or result window. If so, please provide your input data so I can reproduce the error.

Q: I have this code:
var re = new RegExp("\w{3}", "i");
trace(re.test("ABC"));

Why does this expression return "true" in testing console, but "false" in my code?
A: Note that ActionScript compiler tries to interpret the character following backslash before the argument ispassed to RegExp constructor. In the case above, RegExp constructor receives string "\{3}" and will look for completely different pattern as you expected. Instead of single backslash you have to write two: new RegExp("\\w{3}", "i")

Looking for help on:

RegExp.multiline property - why the hack RegExp has this property, if regular expression object already has it
RegExp.input property - what it is used for and the way it must really behave.
RegExp.leftContext property - some references tell its not the whole string to the left from match, but only starting from previous lastIndex position. MSIE5 JS returns all the portion from zero to the start of match. Which one is the right behavior?

Known bugs:

- In case of ORed expressions with parens like "(abc)|(def)|(ghi)" when testing on string "defghiabc" the $1..$9 and lastParen properties of RegExp (and return array of exec()) contain different values from MSIE5.5 implementation. For the sake of speed, this class walks through all the ORed sections remembering parenthesized components, and jumps to the next OR section once surpasses the best matching index so far. MSIE5.5 evaluates all sections separately and remembers parenthesized component matches only from the winner section, leaving all other empty.
- The regexes in parenthesis are extremely "greedy". If they can't eat all, they match zero characters. For example, consider this expression: "(a+)a". It should match string "aa". But, the expression in parens eats up all the a's and could not match the second "a". Fix for this bug involes much restructuring of the engine, so it is not available right now. Workaround could be the restructuring of the expression itself.
- When an expression with "^" metacharacter is being used, all the subsequent calls to the RegExp engine expect that they match the beginning of the string regardless wether the "^" metacharacter is used again or not. The fixed files are available for download.
- When an expression with "^" metacharacter is being used in combination with "global" flag, in cases when the pattern-matching strings follow immediately each other, consequential calls to "test" method will yield multiple matches, from which only the first one will be correct. This bug causes erroneous behaviour of String.match method. To reproduce the bug, use pattern "^[abc]" (don't forget the global flag) on string "cabadef", and check the results of String.match method. Currenly no fix is available.
- The escaped "\" character in the character range definition was not interepreted correctly. To reproduce the bug, use pattern "^[\\\\]" (don't forget the global flag) on string "\\", and check the results of regex.test method. The fixed files are available for download.