Can anyone help me out with this
I found this script on wdvl and tried to get it to work but to no avail. Does anyone see a problem with this script? Supposedly it parses web pages and returns a summary of its contents.
code:
#!/usr/bin/perl use CGI; use LWP::Simple; use HTML::TokeParser; $cgiobject=new CGI; $cgiobject->use_named_parameters; print $cgiobject->header; print $cgiobject->start_html(-title=>'Page Parser',-bgcolor=>'white'); print $cgiobject->startform(-method=>'get',-action=>'parsepage.cgi'); print "URL to Analyze:". $cgiobject->textfield(-name=>'url',-size=>'40'); print "<br>". $cgiobject->submit(-value=>'Analyze'); print $cgiobject->endform; print "<hr>"; #retrieve web page $fetchURL=$cgiobject->param("url"); if(!$fetchURL){ $fetchURL=""; } $webPage=get($fetchURL); print <<ENDHTML; <center><h2>$fetchURL<br> has been sliced and diced, thus revealing:</h2></center> ENDHTML &parse_title; &parse_meta_description; &parse_meta_keywords; &parse_images; &parse_hyperlinks; print $cgiobject->end_html; sub parse_title{ #parse and output page title $parser=HTML::TokeParser->new(\$webPage); $parser->get_tag("title"); print "<p><h2>Page title</h2> ". $parser->get_trimmed_text."</p>"; } sub parse_meta_keywords{ #parse and output meta data $parser=HTML::TokeParser->new(\$webPage); while (my $token=$parser->get_tag("meta")) { if ($token->[1]{name}=~/keywords/i) { print "<p><h2>Meta Keywords</h2> ".$token->[1]{content}."</p>" } } } sub parse_meta_description{ #parse and output meta data $parser=HTML::TokeParser->new(\$webPage); while (my $token=$parser->get_tag("meta")) { if ($token->[1]{name}=~/description/i) { print "<p><h2>Meta Description</h2> ".$token->[1]{content}."</p>" } } } sub parse_images{ #parse and count images $parser=HTML::TokeParser->new(\$webPage); my $imageTotal=0; while ($parser->get_tag("img")) { $imageTotal++ } print "<p><h2>Image Count</h2> "."Total = $imageTotal</p>"; } sub parse_hyperlinks{ #parse and output hyperlinks $parser=HTML::TokeParser->new(\$webPage); print "<p><h2>Hyperlink Summary</h2>"; while (my $token = $parser->get_tag("a")) { my $linkURL = $token->[1]{href} | | "-"; my $linkText = $parser->get_trimmed_text("/a"); if ($linkText=~/<image/i) {$linkText="image"} print "<small>$linkText</small> "."<b>links to</b> $linkURL<br>" } } [/code]
orlando_5 posted this at 09:51 — 14th March 2000.
They have: 123 posts
Joined: Dec 1999
Is your perl dir in #!/usr/bin/perl or something different?
roBofh posted this at 16:03 — 14th March 2000.
They have: 122 posts
Joined: Jun 2000
what errors do you get? or does it only print out some stuff?
fairhousing posted this at 16:55 — 14th March 2000.
They have: 1,587 posts
Joined: Mar 1999
where's a place to get:
use LWP::Simple;
------------------
Thumbs up or down ratings of the best and worst ways to make $$$ on the net. CLICK 4 CASH! from Affiliate Programs and Ad Networks
Traffic-Website.com free traffic, affiliate programs, hosting, & domain names.
My Site got hacked, but i'm coming back?
roBofh posted this at 18:25 — 14th March 2000.
They have: 122 posts
Joined: Jun 2000
LWP::Simple is a perl module, you can get it (and many others) at www.cpan.org
Gage posted this at 18:44 — 14th March 2000.
They have: 17 posts
Joined: Mar 2000
The script seems to work fine, however it does not return any output when it does the actual parsing. All i get is the page with the form, the following text: "url has been sliced and diced, thus revealing:" and that's it. It doesn't return any of the info from the pages it fetches.
Gage posted this at 19:55 — 14th March 2000.
They have: 17 posts
Joined: Mar 2000
I forgot to give a working example of this script so here goes: http://www.wdvl.com/Authoring/Languages/Perl/PerlfortheWeb/cgi-bin/parsepage.cgi
orlando_5 posted this at 20:42 — 14th March 2000.
They have: 123 posts
Joined: Dec 1999
I think the problem is how you enter the URL. You must enter the URL as: http://www.dqn.org/ instead of dqn.org. It work fine for me when I inserted the http:// in the URL.
----
www.dqn.org
Justin S posted this at 20:43 — 14th March 2000.
They have: 2,076 posts
Joined: Jun 1999
Wow- thats a cool script. I found the problem: you have to type the full URL for it to work. So "http://www.fireburn.com" would work but "fireburn.com" wouldn't.
* edit: darn orlando, you posted a minute before me *
------------------
The fireburn.com Network:
[This message has been edited by Justin Stayton (edited 14 March 2000).]
Justin Stayton - [email] [icq]
Gage posted this at 21:49 — 14th March 2000.
They have: 17 posts
Joined: Mar 2000
Did you guys install the script on your own servers? Cause it works fine on wdvl but i can't get it to return anything on mine.
Orpheus posted this at 21:57 — 14th March 2000.
They have: 568 posts
Joined: Nov 1999
I think the problem may be with CGI.pm
maybe your server is running an old version or it or LWP. LWP however rarely gets updated and it comes packaged with perl.
If CGI.pm isn't working right it may return invalid headers...
Gage posted this at 23:31 — 14th March 2000.
They have: 17 posts
Joined: Mar 2000
First off thanx for the help. I still haven't been able to figure this thing out. It's been driving me nuts. For simplicity and to try and figure out why the script isn't working i resulted to this:
Orpheus posted this at 00:06 — 15th March 2000.
They have: 568 posts
Joined: Nov 1999
Your using netscape aren't you
That just means that the document on the server is sending back no data. Which means one of two things.
1) LWP isn't pulling the page
2) Something is wrong with CGI.pm and it isn't generating the HTML correctly.
Gage posted this at 00:18 — 15th March 2000.
They have: 17 posts
Joined: Mar 2000
A prophet? Yup, netscape it is.
Actually LWP is working fine. I know that only because i use it on other scripts. How would i be able to find out what version of cgi.pm i have?
Also, if i use mirror instead of get, i actually get the output i want, but i'll pretty much go crazy if i don't find out why this script doesn't work as it is.
Orpheus posted this at 02:41 — 15th March 2000.
They have: 568 posts
Joined: Nov 1999
you could just stop using CGI.pl
orlando_5 posted this at 07:53 — 15th March 2000.
They have: 123 posts
Joined: Dec 1999
Is your website in a Cobalt server? Cobalt server is known to have problems with CGI scritps due to CGI Wapper. That might be the problem. Check with the server admin.
I do not think the script is the problem...
----
www.dqn.org
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.