Regular Expressions Problem with Parenthesis in the String

pr0gr4mm3r's picture

He has: 1,502 posts

Joined: Sep 2006

I can't seem to figure out this regular expression. Here is what I have that works:

/[0-9]{3,4} AM|PM [A-Z]{3,4} ([A-Z]{3,4} ){2}[0-9]{1,2} [0-9]{4}/

It correctly matches something like the following date:

928 PM EST MON DEC 1 2008

...but I also need it to match a date like this:

928 PM EST (828 PM CST) MON DEC 1 2008

...and, for the life of me, I can't get parenthesis in my regular expression to work properly. I know it has to be escaped, but it still doesn't work. This is my best effort:

/[0-9]{3,4} AM|PM [A-Z]{3,4} (\([0-9]{3,4} AM|PM [A-Z]{3,4}\) )?([A-Z]{3,4} ){2}[0-9]{1,2} [0-9]{4}/

Any help would be appreciated Smiling

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

I can't help much, but is there any reason you don't want to do this with string functions?

You could explode the string on the space, and check each chunk. Regular expressions are so hard to maintain, and you can never really tell if they really get the job done, securely.

Good luck with that, and post your final result Smiling

pr0gr4mm3r's picture

He has: 1,502 posts

Joined: Sep 2006

The date string I posted is what I'm trying to extract from a larger block of text. I'm extracting the date string from NWS's Zone Forecast Product. Here are two stations as an example. Those pages I linked are for entire stations, so you see several forecasts for the zones within that station coverage area. I have a regular expression to parse the zones ok, but it's the dates that I can't seem to isolate. I have to get a regular expression match on this so I know what the updated date/time is on it.

The main issue I'm having with this specific one is that I can't get a parenthesis to work in the expression. Is there something I have to do besides escaping it with a '\'?

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

Will you take a POSIX?

([0-9]{3,4} (AM)|(PM) [A-Z]{3,4} ([A-Z]{3,4} ){2}[0-9]{1,2} [0-9]{4})|([0-9]{3,4} (AM)|(PM) [A-Z]{3,4} \([0-9]{3,4} (AM)|(PM) [A-Z]{3,4}\) ([A-Z]{3,4} ){2}[0-9]{1,2} [0-9]{4})

I took your first regex which worked for the first match, and then I made another regex that matched the second example, and combined them with an |

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

I don't think you're escaping the ) wrong, because when you replace that with some random letter, it still won't match it

/[0-9]{3,4} AM|PM [A-Z]{3,4} (\([0-9]{3,4} AM|PM [A-Z]{3,4}K )?([A-Z]{3,4} ){2}[0-9]{1,2} [0-9]{4}/

Should match (it doesn't):

928 PM EST (828 PM CSTTK MON DEC 1 2008

So I don't think the problem is with the escaping.

teammatt3's picture

He has: 2,102 posts

Joined: Sep 2003

Ooooh, try this one PCRE:

/\(?[0-9]{3,4} AM|PM [A-Z]{3,4}\)? ([A-Z]{3,4} ){2}[0-9]{1,2} [0-9]{4}/

That is your first regex, except it allows for 0 or 1 opening and closing parenthesis.

Sorry for the triple post Smiling

pr0gr4mm3r's picture

He has: 1,502 posts

Joined: Sep 2006

Sorry for the triple post

Post as much as you want if you have a solution. Smiling I was talking to kazimmerman earlier over IM, and he helped me get close, but I think this one will work. I copied it into your tool, and sure enough, it goes green. Thanks so much for the help. Smiling

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.