-
VBScript Regular ExpressionsFull-Stack/Back-end 2008. 8. 2. 22:24
Regular expressions in VBScript are two words that can bring many to their knees, weeping, but they are not as scary as some would have you believe. With their roots in Perl, regular expressions in VBScript use similar syntax, and the chances are that you may already be familiar with the concepts here if you have played with regular expression matching before.
Below, you will find three sections. The first section, Reference, is a simple reference listing the most-used of the various symbols and characters used in regular expressions. The second section, Functions, has two functions in it that may make life easier for you. The third section, Examples, is where the fun begins - examples of regular expressions in action.
Reference
Character Sets and Grouping
- . - Any single character (except new line character, "\n")
- [] - Encloses any set of characters
- ^ - Matches any characters not within following set
- [A-Z] - Any upper case letter between A and Z
- [a-z] - Any lower case letter between a and z
- [0-9] - Any digit from 0 to 9
- () - Group section. Also can then be back-referenced with $1 to $n, where n is the number of groups
- | - Or. (ab)|(bc) will match "ab" or "bc"
Repetition
- + - One or more
- * - Zero or more
- ? - Zero or one
- {5} - Five
- {1,3} - One to three
- {2,} - Two or more
Positioning
- ^ - Start of string
- $ - End of string
- \b - End of word
- \n - New line
- \r - Carriage return
Miscellaneous
- \ - Escape character
- \t - Tab
- \s - White space
- \w - Matches word (equivalent of [A-Za-z0-9_])
Please note that the escape character mentioned above is not usable in normal VBScript. Regular expression syntax is based upon Perl regular expression syntax. To escape a character in VBScript, you usually double it. For example, the following will print out 'This is a "quoted" piece of text'.
response.write("This is a ""quoted"" piece of text.")
Functions
The first of the functions below, ereg (named after the PHP function to keep me from going quite quite mad), is the one you will probably use most. Simply put, if you feed in a string, pattern, and choose whether or not you would like to ignore the case of letters in either, the function will return TRUE if the string contains the pattern, or FALSE if not.
function ereg(strOriginalString, strPattern, varIgnoreCase)
' Function matches pattern, returns true or false
' varIgnoreCase must be TRUE (match is case insensitive) or FALSE (match is case sensitive)
dim objRegExp : set objRegExp = new RegExp
with objRegExp
.Pattern = strPattern
.IgnoreCase = varIgnoreCase
.Global = True
end with
ereg = objRegExp.test(strOriginalString)
set objRegExp = nothing
end function
Next up we have ereg_replace. Like it's shorter cousin, you need to feed it a string, a pattern and choose your case sensitivity. This time, you must also add a replacement. This function will replace all instances of the pattern with the replacement in the string (if you change ".Global = True" to ".Global = False" then the function will only replace the first instance of the pattern with the replacement).
function ereg_replace(strOriginalString, strPattern, strReplacement, varIgnoreCase)
' Function replaces pattern with replacement
' varIgnoreCase must be TRUE (match is case insensitive) or FALSE (match is case sensitive)
dim objRegExp : set objRegExp = new RegExp
with objRegExp
.Pattern = strPattern
.IgnoreCase = varIgnoreCase
.Global = True
end with
ereg_replace = objRegExp.replace(strOriginalString, strReplacement)
set objRegExp = nothing
end function
Examples
Example 1: Checking hexadecimal string
A hexadecimal number can be made up of any digit, and any letter, upper or lower case, between a and f, inclusive. So to check if a string is actually hexadecimal, the following will do quite nicely (strOriginalString is the original string to be tested):
<%
if ereg(strOriginalString, "[^a-f0-9\s]", True) = True then
response.write "String is not hexadecimal."
else
response.write "String is hexadecimal."
end if
%>
The pattern, "[^a-f0-9\s]" matches anything that is not in the set of characters specified (so if there is anything in the string that is not in that set, the function will return True). The characters specified are all letters between a and f inclusive, and we've specified a case insensitive match, so upper case letters will be treated the same way. We are also allowing whitespace (new lines, spaces, carriage returns and tabs), which is what the "\s" represents in regular expressions.
Example string that returns False (and is therefore hexadecimal):
AAcc99
Example 2: Masking the last section of an IP address
An IP address is made up of four sets of numbers seperated by periods. It's common practice, if you are going to display visitor (or any) IP address on your site, to mask the last (fourth) set of numbers. Here's a way to use ereg_replace to do just this:
<%
strOriginalString = ereg_replace(strOriginalString, "([^0-9])([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.[0-9]{1,3}([^0-9])", "$1$2.$3.$4.***$5", True)
%>
This is a little more tricky, as you'd hopefully expect from a second example. It looks harder than it is though, so one step at a time. There are actually only a few entities in the pattern - they are just repeated. The most important is this: "([0-9]{1,3})". It matches a section of an IP adress, and is enclosed in brackets so that this section can be used in the replacement of the pattern as well (otherwise we would not be able to keep the first three parts of the IP address to display). You can see these sections in use, referenced with "$2", "$3" and "$4" in the replacement. The pattern within the brackets simply says "between one and three digits between 0 and 9".
The second repeated section is "\.". We use a backslash before the period to indicate that this period (the character following the backslash) is to be treated as a normal period. We call this an escaped character, and this is a fairly common practice. The period, unescaped (without the backslash), is used as a symbol representing "any character except the new line character".
Example input text:
My IP address is 123.456.78.9 but 4444.1.1.1 is just a bunch of random numbers, and so is 12.34.56, and 1.1.1.1 is another valid IP.
Example output text:
My IP address is 123.456.78.*** but 4444.1.1.1 is just a bunch of random numbers, and so is 12.34.56, and 1.1.1.*** is another valid IP.
Example 3: Making the second word of every sentence in a string bold, as long as the word before only contains upper case letters and the second word does not contain an even digit
Getting more interesting now, this example is not in the least bit useful in practice, but should prove to be a useful demonstration of the power of regular expressions. It sounds tough - but with regular expressions, it's a walk in the park.
<%
strOriginalString = ereg_replace(". " & strOriginalString, "(\.|!|\?)\s([A-Z]+)\s([^02468\s]+)\s", "$1 $2 <strong>$3</strong> ", False)
strOriginalString = mid(strOriginalString, 2)
%>
We start by adding an artificial period and space to the beginning of the string, just to make sure we catch the first sentence, and add a line to strip our extra characters out afterwards. We only want those sentences split with punctuation and a space, or we'll end up with bold decimals and it will be very messy indeed. So, we check for puncuation, followed by a space, followed by a word made entirely of capitals, followed by another space, followed by a second word that doesn't contain even numbers, or whitespace, followed by a space. If we find that, we replace it with the same items we picked up in brackets, only with a <strong></strong> tag pair around the second word.
Example input text:
THE quick brown fox jumped over the lazy dog? Many red balloons blew up! EVEN num2ber sentence. ODD num3ber sentence.
Example output text:
THE quick brown fox jumped over the lazy dog? Many red balloons blew up! EVEN num2ber sentence. ODD num3ber sentence.
댓글