Can anyone help me out with this

They have: 17 posts

Joined: Mar 2000

I found this script on wdvl and tried to get it to work but to no avail. Does anyone see a problem with this script? Supposedly it parses web pages and returns a summary of its contents.

code:

#!/usr/bin/perl
use CGI;
use LWP::Simple;
use HTML::TokeParser;

$cgiobject=new CGI;
$cgiobject->use_named_parameters;
print $cgiobject->header;

print $cgiobject->start_html(-title=>'Page Parser',-bgcolor=>'white');

print $cgiobject->startform(-method=>'get',-action=>'parsepage.cgi');
print "URL to Analyze:".
$cgiobject->textfield(-name=>'url',-size=>'40');
print "<br>".
$cgiobject->submit(-value=>'Analyze');
print $cgiobject->endform;
print "<hr>";

#retrieve web page
$fetchURL=$cgiobject->param("url");
if(!$fetchURL){
$fetchURL="";
}

$webPage=get($fetchURL);

print <<ENDHTML;
<center><h2>$fetchURL<br>
has been sliced and diced,
thus revealing:</h2></center>
ENDHTML

&parse_title;
&parse_meta_description;
&parse_meta_keywords;
&parse_images;
&parse_hyperlinks;
print $cgiobject->end_html;

sub parse_title{
#parse and output page title
$parser=HTML::TokeParser->new(\$webPage);
$parser->get_tag("title");
print "<p><h2>Page title</h2> ".
$parser->get_trimmed_text."</p>";
}

sub parse_meta_keywords{
#parse and output meta data
$parser=HTML::TokeParser->new(\$webPage);
while (my $token=$parser->get_tag("meta"))
 { if ($token->[1]{name}=~/keywords/i)
   { print "<p><h2>Meta Keywords</h2> ".$token->[1]{content}."</p>" }
 }
}

sub parse_meta_description{
#parse and output meta data
$parser=HTML::TokeParser->new(\$webPage);
while (my $token=$parser->get_tag("meta"))
 { if ($token->[1]{name}=~/description/i)
  { print "<p><h2>Meta Description</h2> ".$token->[1]{content}."</p>" }
 }
}

sub parse_images{
#parse and count images
$parser=HTML::TokeParser->new(\$webPage);
my $imageTotal=0;
while ($parser->get_tag("img"))
 { $imageTotal++ }
print "<p><h2>Image Count</h2> "."Total = $imageTotal</p>";
} 

sub parse_hyperlinks{
#parse and output hyperlinks
$parser=HTML::TokeParser->new(\$webPage);
print "<p><h2>Hyperlink Summary</h2>";
while (my $token = $parser->get_tag("a")) 
 { my $linkURL = $token->[1]{href} | | "-";
   my $linkText = $parser->get_trimmed_text("/a");
   if ($linkText=~/<image/i) {$linkText="image"}
   print "<small>$linkText</small> "."<b>links to</b> $linkURL<br>"
}
}
[/code] 

They have: 123 posts

Joined: Dec 1999

Is your perl dir in #!/usr/bin/perl or something different?

They have: 122 posts

Joined: Jun 2000

what errors do you get? or does it only print out some stuff?

They have: 1,587 posts

Joined: Mar 1999

where's a place to get:

use LWP::Simple;

------------------
Thumbs up or down ratings of the best and worst ways to make $$$ on the net. CLICK 4 CASH! from Affiliate Programs and Ad Networks

Traffic-Website.com free traffic, affiliate programs, hosting, & domain names.
My Site got hacked, but i'm coming back?

They have: 122 posts

Joined: Jun 2000

LWP::Simple is a perl module, you can get it (and many others) at www.cpan.org

They have: 17 posts

Joined: Mar 2000

The script seems to work fine, however it does not return any output when it does the actual parsing. All i get is the page with the form, the following text: "url has been sliced and diced, thus revealing:" and that's it. It doesn't return any of the info from the pages it fetches.

They have: 17 posts

Joined: Mar 2000

I forgot to give a working example of this script so here goes: http://www.wdvl.com/Authoring/Languages/Perl/PerlfortheWeb/cgi-bin/parsepage.cgi

They have: 123 posts

Joined: Dec 1999

I think the problem is how you enter the URL. You must enter the URL as: http://www.dqn.org/ instead of dqn.org. It work fine for me when I inserted the http:// in the URL.

Justin S's picture

They have: 2,076 posts

Joined: Jun 1999

Wow- thats a cool script. I found the problem: you have to type the full URL for it to work. So "http://www.fireburn.com" would work but "fireburn.com" wouldn't.

* edit: darn orlando, you posted a minute before me Smiling *

------------------
The fireburn.com Network:

  • fireburn.com [www.fireburn.com]
  • Flame Hosting [www.flamehosting.com]
  • Ineffable Designs [www.ineffabledesigns.com]
  • The Webmasters Portal [www.webmasters-portal.com]
  • [This message has been edited by Justin Stayton (edited 14 March 2000).]

    They have: 17 posts

    Joined: Mar 2000

    Did you guys install the script on your own servers? Cause it works fine on wdvl but i can't get it to return anything on mine.

    They have: 568 posts

    Joined: Nov 1999

    I think the problem may be with CGI.pm

    maybe your server is running an old version or it or LWP. LWP however rarely gets updated and it comes packaged with perl.

    If CGI.pm isn't working right it may return invalid headers...

    They have: 17 posts

    Joined: Mar 2000

    First off thanx for the help. I still haven't been able to figure this thing out. It's been driving me nuts. For simplicity and to try and figure out why the script isn't working i resulted to this:

    code:

    #!/usr/bin/perl
    use CGI;
    use LWP::Simple;
    use HTML::TokeParser;
    
    $cgiobject=new CGI;
    print $cgiobject->header;
    
    $page = get("http://www.yahoo.com/");
    
    $p=HTML::TokeParser->new(\$page);
    
    if ($p->get_tag("title")) {
          my $title = $p->get_trimmed_text;
          print "Title: $title\n";
    }
    
    while (my $token = $p->get_tag("a")) {
    my $url = $token->[1]{href} | | "-";
    my $text = $p->get_trimmed_text("/a");
    print "$url\t$text";
    print "<br>";
    }
    
    print $cgiobject->end_html;
    [/code]
    
    For some reason this returns "document contained no data". Anyone? 

    They have: 568 posts

    Joined: Nov 1999

    Your using netscape aren't you

    That just means that the document on the server is sending back no data. Which means one of two things.

    1) LWP isn't pulling the page
    2) Something is wrong with CGI.pm and it isn't generating the HTML correctly.

    They have: 17 posts

    Joined: Mar 2000

    A prophet? Yup, netscape it is.

    Actually LWP is working fine. I know that only because i use it on other scripts. How would i be able to find out what version of cgi.pm i have?
    Also, if i use mirror instead of get, i actually get the output i want, but i'll pretty much go crazy if i don't find out why this script doesn't work as it is.

    They have: 568 posts

    Joined: Nov 1999

    you could just stop using CGI.pl

    They have: 123 posts

    Joined: Dec 1999

    Is your website in a Cobalt server? Cobalt server is known to have problems with CGI scritps due to CGI Wapper. That might be the problem. Check with the server admin.

    I do not think the script is the problem...

    Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.