Selective searching?

They have: 334 posts

Joined: Dec 1999

I'm writing a small CGI search script for my site and am running into a problem that I'm not able to solve. I'd like the search to return the link, the page title and the first few lines of content. That's not a problem in itself and I have it working okay.

However, the problems arises in the returns for the first few lines of content. I'm using the familiar thin left navigational column for internal links with the main site content in the larger right side box. The search returns the links information since that's the first content it comes to, so it returns the same description for each page. I can of course turn off the description feature and have the search just return the page title, but I'd rather not go that route and I don't want to hand edit over 1000 HTML pages to include a META tag that lists the individual page contents. I'm looking for some sort of Perl or batch-edit solution.

So, can I make the search selective in what areas of the page it searchs? Can I hide that links column info from the search script so that it returns better results? If I stick the navigational links stuff into a separate text or HTML file and load it into the page via SSI (like <!--#include file="navigation.htm" --> )would that work?

They have: 161 posts

Joined: Dec 1999

I'm not totally sure I understand problem in this case... is it that each HTML page has a common section that you don't want displayed as "description"; you'd rather skip ahead to where the actual content begins?

If so, this solution might work well for you:

Code Sample:

open FILE, "some.html" or
  die "can't open some.html: $!";
{ # this slurps the ENTIRE file into
  # a scalar variable (quickly, too!)
  local $/;
  $file = &lt;FILE&gt;;
}
close FILE;

$endcommon = &lt;&lt; 'END';
This should be a string holding the END of
the "common sidebar" thing, or whatever.  As
soon as this string is found, the VERY NEXT
character in the file will be a candidate for
the description.
END

if (($p = index($file,$endcommon)) != -1) {
  while (length($descrip) &lt; 200) {
    $descrip = substr($file,$p,200);
    $p += 200;
    # get next 200 chars
    # NOTE: THIS IS A VERY POOR HTML TAG
    # STRIPPING ROUTINE
    # it "breaks" for:
    # &lt;img src="foo.gif" alt=" ---&gt; "&gt;
    $descrip =~ s/&lt;[^&gt;]*&gt;//g;
  }
  $descrip .= "...";
}

$descrip | |= "No description";

------------------
--
MIDN 4/C PINYAN, NROTCURPI, US Naval Reserve

[This message has been edited by japhy (edited 16 December 1999).]

They have: 161 posts

Joined: Dec 1999

To see a much better example (I fixed the code up considerably), please go to this URL: http://www.pobox.com/~japhy/perl/forum_examples

The directory is the one that says "maverick". In there are a couple mock HTML files, and a program that displays their titles, descriptions, and a link to each.

I'll be working on it -- email me (or post in this forum) any questions, comments, concerns you have.

------------------
--
MIDN 4/C PINYAN, NROTCURPI, US Naval Reserve

They have: 334 posts

Joined: Dec 1999

Thanks very much.

They have: 297 posts

Joined: Apr 1999

Nice coding Japhy,

may I invite you to participate in TWF's BB project @ www.Boardzilla.org

Later,

Malte

------------------
Malte Ubl - www.Boardzilla.org
Communication: public<->programmers
of the Boardzilla BB

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.