Searching a PDF File from an HTML Page?

They have: 54 posts

Joined: Mar 2000

I have a client who wants to be able to search a PDF catalog from an HTML page.

*scratching head here* Do any of you know how to do this?

Can a PDF file have a search in it?

sersun's picture

They have: 32 posts

Joined: Aug 2001

Hmm, I know it's *possible* because yahoo and google index pdf's now.

Google has a free site search utility http://www.google.com/services/free.html, if appearance isn't that important.

sersun

They have: 54 posts

Joined: Mar 2000

Thank you for your reply. From what I gleaned at Google, Google doesn't search the entire site and the user would be taken out of their site into Googles. Don't think my client would like that. I don't mind paying for a script if it works.

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

Have you tried searching http://hotscripts.com or http://resourceindex.com?

A search for "search PDF" at hotscripts came up with some results.

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

When in doubt, go to the source!

PDFs are searchable if designed properly, check out the Acrobat section for more information.

Otherwise, in most search scripts you can specify what types of files to return. Simply specify .pdf files.

There are also .pdf servers, but that may be a further step down the road. Again, the Adobe site would be the best place to find this information.

Smiling Suzanne

They have: 54 posts

Joined: Mar 2000

Thank you...thank you. I'm not too familiar with Acrobat Reader, but I did find Acrobat's search button.

My dilemna is this...My client wants a huge ole catalog listing parts by number and name put on line in PDF and wants it searchable from an HTML page. The Acrobat Search would probably work in the PDF file. But how do I do a search for the PDF file from an HTML page?

In other words, the HTML page would bring up the catalog number, the user would click on the catalog number, and then would link directly to the PDF file for that catalog number which would show a drawing and have specifications of the part.

This part of my client's site was a big surprise to me! They are putting together the PDF files (and there must be 1000 of them) thank God!

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

So, does it really need to be searchable? Or does it just need indexed?

If you need it to be searchable, your going to need to use a server-side language.

Mark Hensler
If there is no answer on Google, then there is no question.

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

Argh, lost my post by accident.

Most sensible approach -- name the pdf files appropriately:

partno_name_date.pdf

And then have the server-side or hosted search engine look for only .pdf files, or only files in one folder (which would house ALL the .pdf files). It will look at the file names, and return whatever people are looking for within an html page as clickable links.

Does that make sense?

The other route is to index the results of an Acrobat search to a file and use that to generate the links, but that's a little more complex. It depends if the client needs the .pdf files themselves to be searchable (the content of them), or merely for them to be easily located.

Smiling Suzanne

They have: 54 posts

Joined: Mar 2000

You are great! I REALLY appreciate the help here. I have posted a few times when I've been in trouble and have definitely received some professional help. I hope that someday I'll be able to reciprocate.

Suzanne, yes, it makes a lot of sense. The client just needs the PDF files to be easily located. The user would run search for a part number or name of part number...that number would show up in search result. Then the user would click on the search result and be taken to the exact location of that part number or name of part number.

Let me make sure I thoroughly understand.

First, when I receive the PDF files, I would need to index these 1000 pages in one file (let's call the file "Parts Index" for simplicity's sake) and link each item to the appropriate PDF file.

Then, I would set up my HTML page with the search form. The code I have for a search is:

!--webbot bot="Search" S-Index="All" S-Fields
S-Text="Search for:" I-Size="40" S-Submit="Start Search" S-Clear="Reset"
S-TimestampFormat="%m/%d/%Y" TAG="BODY" b-useindexserver="0" --

(greater and lesser omitted on purpose)

What do I change in this code to make it search only the "Parts Index" file? Logic tells me that the "S-Index="All" would need to be changed, but to what?

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

You'll have to pop into the scripting forum for that one. I'm a Perl Grrl, and anything bot/ASP/VBScript makes me go blank.

As for indexing, depending on the script, it may just search the documents in a particular folder, or it may index them, it depends on the search script you use.

Glad you've got it this far, keep on trucking!

Smiling Suzanne

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.