Regular Expression w/Python

 

regexexample

Does the above statements seems like a bunch of gibberish? Welcome my friend to the world of Regular Expression.

Regular Expression is method of searching text for patterns – for example, if you have a string and you want to find all the text in the string that has an address pattern you’ll need to know the pattern of addresses. Example of 324 Main St. has the pattern of starting with a number then a space, then either a letter or a number (for streets like “Third” or “3rd”), then another space and finally any other character until “.”(for St. Ave. Blvd.). You can express that as the following:

coloredregularexpression

Most modern languages have similar libraries packages for regular expression. It gives you the capability to create patterns of characters to search (giving you the ability to split, replace, etc.).

A useful thing to know when using regular expression is Kleene Star(*) and Kleene Plus(+). Its a way to represent either 0 or more ~or~ 1 or more of a character that follows the symbol. If you were searching text for different lengths of numbers you could use \d* and if you want to ensure that the string has at least one digit, you can use \d+.

Let’s say that you want to get all the phone numbers within a document. You can’t simple go “give me all the 7 digit number.”  People have different ways of inputing their phone numbers:

  • 805-234-2344
  • (323)823-2384
  • 564 344 2341
  • 1 238 098 9482
  • 889-477-3656

To be able to get all the phone numbers with out regular expression, you would have to do multiple searches for the different formatting styles. With regular expression you only need one:

1?[\s-]?\(?(\d{3})\)?[\s-]?\d{3}[\s-]?\d{4}

Mind you, this looks hairy and totally crazy, but this expression will be able to will be able get all the above numbers. the ‘1?’ takes care of country codes, the ‘(\d{3})\)’ gets the area code and the ‘\d{3}[\s-]?\d{4}’ gets the 3 digit sequence and the 4 digit sequence. Pretty cool! Thanks @RegexOne for the example!

I am in the process of moving and when I was submitting my Change of Address to the Post Office, my new address wasn’t processing- it said that it wasn’t a valid address. I submitted something like this: “123 Main St. # 101′. Since regular expression is fresh on my mind I thought “Maybe they are not taking into account the space between # 101”, so I tried it again without the space and it worked! Ha!

These are some great resources that I found helpful:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s