Regular Expressions: a simple, easy tutorial

From phi.lho.free.fr I added this page which has awesome information because the original aite was unavailable when I checked it. Just to be clear, this is not my work and copyright info is listed at the bottom.

It needed reposting, so here it is in all it’s g33ky glory…

—————————————————————-

First, you may ask, if you found this page fortuitously, what are regular expressions ? (usually abbreviated as RegExp, RegEx or RE) In a few words, they are a powerful (but a bit geeky) way to manipulate text.

With them, you can see if a generic string (eg. “5 letters followed by 2 digits) is inside a text, you can extract a string out of a text (eg. getting the current version number from a software download page), check if a string meets some criteria (did the user type a date in the right format?), transform a text (morph a list of C’s #defines to a list of variable assignments in your favorite language), split a string with complex requirements (eg. get all words of a natural text, separated by spaces or punctuation signs), etc.

The drawback is its syntax, quite cryptic for the uninitiated (and sometime for the initiated…), but with practice, it appears that most of the tasks use rather simple expressions, so are not hard to master.

Now, don’t feel bad if you had to ask. I see myself as a seasoned professional programmer, with more than 15 years of paid experience, and more than 25 years of programming practice (going back to my first programmable calculator!). Yet, I started only recently to really use REs.

That’s because in most of the programming languages I used (C[++], Pascal, various Basics, assembly languages), regexps are not full part of the language, needing an external, non-standard library, and thus their use wasn’t commonplace.

I used them a bit with Unix tools like grep or sed, but only at low level of expertise. I actually started to use them with the Lua language that has a proprietary implementation, and when the SciTE source code editor integrated a simple implementation of REs. Being a bit primitive, it was less intimidating and I found myself using them more and more.

Then I learned languages fully integrating regexps. A bit of Perl, of course, but I am still a beginner here; JavaScript, which is quite close of Perl in RE syntax; PHP, with both its Posix implementation (ereg, outdated) and its PCRE-based one (preg); and Java, fully integrating REs since version 1.4.

Actually, you are probably using REs without knowing it… At least a very primitive, very simplified version of them, called meta characters in MS-Dos: in file patterns, ? replaces any character and * replaces any sequence of chars, like in *.jpg or backup.00?
(more…)

Continue ReadingRegular Expressions: a simple, easy tutorial