how to curb this quantifier's greediness

They have: 46 posts

Joined: May 2002

Hi - I'm trying to get a Perl script to parse a HTML file and substitute an image like "submit.gif" with the full domain of that image like "http://www.mydomain.com/submit.gif"

It's working on regular image tags, but it's failing on a graphical submit button (see below). What's wrong with the following regular expression in Perl:

$html_line "";

$image_location = "http://www.domain.com/";

$html_line =~ s/(['"])(.+\.gif)?['"]/$1$image_location$2$1/g;

The result is:

It should be:

TIA...

He has: 296 posts

Joined: May 2002

Maybe it's mistaking TYPE for SRC.

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

Woohoo, I love regex!

input.html:

this is a line
<img src="sample.gif" height="1" width="1">
<b>this is line 3</b>
<input type="image" src="sample2.gif">
line 5
'

test.pl:

#!/usr/bin/perl

$input_file = "input.html";
$image_location = "http://www.domain.com/";

open (FILE, "$input_file") || die ("Couldn't open guestbook entries file.");
@input = <FILE>;
close (FILE);

foreach $html_line (@input) {
    $html_line =~ s/((['])?(["])?)((?(2)([^']+?)|([^"]+?))\.gif)(?(2)'|")/$1$image_location$4$1/g;
    print $html_line;
}
'

output:

this is a line
<img src="http://www.domain.com/sample.gif" height="1" width="1">
<b>this is line 3</b>
<input type="image" src="http://www.domain.com/sample2.gif">
line 5
'
I haven't used perl much lately. I can remember how to say "I want a single or double quote, then a string, then another quote like the one I got before". So I did it the round about way. Not bullet proof, but it passed my simple input.html test. I can write a better PHP script because I know how to do the quote matching there. Roll eyes

Mark Hensler
If there is no answer on Google, then there is no question.

Renegade's picture

He has: 3,022 posts

Joined: Oct 2002

is it really a good idea to have the full path? it will make the site longer to load...

They have: 46 posts

Joined: May 2002

Mark - thank you - it definitely works - someday I'll get around to try to understand *how* it works Smiling

Regarding the full path to images - I've got a .cgi script in the cgi-bin dynamically creating a web page - without the full path the images will be looked for in the cgi-bin and will not display.

They have: 601 posts

Joined: Nov 2001

Why not use HTML::TokeParser ?

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

I never got that deep into Perl. Can you show me how HTML::TokeParser would work?

They have: 601 posts

Joined: Nov 2001

Hi Mark

I'd use something like this to grab all image links and prepend an URL. (untested code):

#!/usr/bin/perl

    use strict;

    my $input_file = "input.html";
    my $image_location = "http://www.domain.com/";

    use HTML::TokeParser;
    my $p = HTML::TokeParser->new($input_file);

    while (my $token = $p->get_tag("img")) {

        my $src = $token->[1]{src};
        print $image_location . $src;
    } 
'

- wil

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.