how to curb this quantifier's greediness
Hi - I'm trying to get a Perl script to parse a HTML file and substitute an image like "submit.gif" with the full domain of that image like "http://www.mydomain.com/submit.gif"
It's working on regular image tags, but it's failing on a graphical submit button (see below). What's wrong with the following regular expression in Perl:
$html_line "";
$image_location = "http://www.domain.com/";
$html_line =~ s/(['"])(.+\.gif)?['"]/$1$image_location$2$1/g;
The result is:
It should be:
TIA...
necrotic posted this at 04:57 — 2nd February 2003.
He has: 296 posts
Joined: May 2002
Maybe it's mistaking TYPE for SRC.
Mark Hensler posted this at 08:23 — 2nd February 2003.
He has: 4,048 posts
Joined: Aug 2000
Woohoo, I love regex!
input.html:
this is a line
<img src="sample.gif" height="1" width="1">
<b>this is line 3</b>
<input type="image" src="sample2.gif">
line 5
test.pl:
#!/usr/bin/perl
$input_file = "input.html";
$image_location = "http://www.domain.com/";
open (FILE, "$input_file") || die ("Couldn't open guestbook entries file.");
@input = <FILE>;
close (FILE);
foreach $html_line (@input) {
$html_line =~ s/((['])?(["])?)((?(2)([^']+?)|([^"]+?))\.gif)(?(2)'|")/$1$image_location$4$1/g;
print $html_line;
}
output:
this is a line
<img src="http://www.domain.com/sample.gif" height="1" width="1">
<b>this is line 3</b>
<input type="image" src="http://www.domain.com/sample2.gif">
line 5
I haven't used perl much lately. I can remember how to say "I want a single or double quote, then a string, then another quote like the one I got before". So I did it the round about way. Not bullet proof, but it passed my simple input.html test. I can write a better PHP script because I know how to do the quote matching there.
Mark Hensler
If there is no answer on Google, then there is no question.
Renegade posted this at 09:32 — 2nd February 2003.
He has: 3,022 posts
Joined: Oct 2002
is it really a good idea to have the full path? it will make the site longer to load...
critical posted this at 13:04 — 2nd February 2003.
They have: 46 posts
Joined: May 2002
Mark - thank you - it definitely works - someday I'll get around to try to understand *how* it works
Regarding the full path to images - I've got a .cgi script in the cgi-bin dynamically creating a web page - without the full path the images will be looked for in the cgi-bin and will not display.
Wil posted this at 13:05 — 2nd February 2003.
They have: 601 posts
Joined: Nov 2001
Why not use HTML::TokeParser ?
Mark Hensler posted this at 19:32 — 2nd February 2003.
He has: 4,048 posts
Joined: Aug 2000
I never got that deep into Perl. Can you show me how HTML::TokeParser would work?
Wil posted this at 10:24 — 3rd February 2003.
They have: 601 posts
Joined: Nov 2001
Hi Mark
I'd use something like this to grab all image links and prepend an URL. (untested code):
#!/usr/bin/perl
use strict;
my $input_file = "input.html";
my $image_location = "http://www.domain.com/";
use HTML::TokeParser;
my $p = HTML::TokeParser->new($input_file);
while (my $token = $p->get_tag("img")) {
my $src = $token->[1]{src};
print $image_location . $src;
}
- wil
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.