Regular expressions are extremely useful when dealing with lots of data and filtering out what you want exactly... Its like asking for a needle in the hay stack...
The use of this is marred by the difficulty and the learning curve associated with learning Regular expressions....
Developers of Perl are familiar with Regular expressions generally since they do deal with the symbols and expressions whereas a developer coming from a Microsoft technology background will find it hell to understand and develop similiarly.
I will try to easy some pain for the beginners here by discussing about the basics of Regular Expressions since understanding the bits and pieces of the Regular expressions will make it easy in developing the same....
1) Regular expressions start with the ^ or Carat symbol so for ex: ^abcd means find all the words that start with 'a' and have 'abcd' in it.... it means it will match 'abcd-1234' and 'abcd' but will not match 'ab' and '1212abcd' since the match specifies to find aa words/string starting with 'a' and and is followed with 'bcd' and anything else at the end of it.
Always remember to put the Carat symbol while starting the regular expression to match else could result in serious flaws.
The $ symbol i.e Dollar symbol is used to notify the end of the string to match. for eg ^abcd$ will only match for 'abcd'. It will match only the words which start with 'a' end with 'd' and has 'bc' in the same order as in the expression. So the words 'abcd-123' will not match '12abcd' will not match too.
Remember to put the $ symbol at the end to indicate the end letter/number for matching
The * symbol matches the preceding character zero or more times. For example "to*n" matches "ton", "toon" or "tn" the + symbol must match one or more times.
To match a range of repeated characters we use the [] symbols, for eg "spo[1-2]n" will match spoon and spon but will not match spooon.
? = making a character optional.
+ = {1,}
? = {0,1}
Remember in the format {a,b} there should be no space betwen the comma and the number.
x|y matches either x or y
so (m|s)ad matches mad or sad.
\d will match any numeric digit
\D will match any NON-digit character
\b = word boundary
eg man\b matches : heman but does not match manchester
\B = non-word boundary
\s matches any white space character including space tab and form-feed.
\w matches any word character including underscore. [A-Za-z0-9_]
\W [^A-Za-z0-9_]
\k is called back referencing character.... for eg: the word 'little' has 2 t's hence the regular expression will reach the letter t in the above word and find if there is another t that precedes it. so it looks back to find immediate occurances of the letter/character which it is processing currently.
Thursday, August 21, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment