which language for archiving articles?

They have: 14 posts

Joined: Dec 2003

tried posting this on the web database forum with no luck...

I have a bunch of newspaper articles that I'd like to put online as a searchable archive. The only search parameter I think I'll need (in addition to full text) is a category field that is populated by myself. That is, people can search the database for certain words or phrases that exist within the article text; or, they can choose from a set list of topics and are given every article that fits that topic. I determine which topics are appropriate for each article, and an article can have more than one topic. What type of database should I use for this? I have experience only with PHP, but I'm eager to explore other ways of doing this.

thanks,
Matt

-Matt
Iraq Media Developments Newsletter...and telecom stuff too.

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

The type of database isn't relevant, I think that's your problem. All databases will handle this with equal measure as long at they are set for multiple users (i.e. not desktop single user databases like Access).

MySQL is popular as an open source robust database, and you can also explore proprietary databases such as SQLserver (MS) and Oracle.

You'd probably be better set to index the articles and search the index instead of the database all the time.

They have: 14 posts

Joined: Dec 2003

Suzanne wrote: You'd probably be better set to index the articles and search the index instead of the database all the time.

interesting...how do you mean?

-Matt
Iraq Media Developments Newsletter...and telecom stuff too.

druagord's picture

He has: 335 posts

Joined: May 2003

Quote:
I have experience only with PHP, but I'm eager to explore other ways of doing this.

why would you want something else the faster route to somewhere is the route that you know. IMO php is faster for devellopement then most other server side language.

IF , ELSE , WHILE isn't that what life is all about

They have: 14 posts

Joined: Dec 2003

druagord wrote: why would you want something else the faster route to somewhere is the route that you know...

curiosity, challenge of learning something new, something else to add to my resume...and i want to do it right, not fast.

-Matt
Iraq Media Developments Newsletter...and telecom stuff too.

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

If you index the articles, and search the index, you are searching one document. It is much faster and reduces wear and tear on your server.

Here's the thing -- if you have a small database, it's not really going to matter, if you have a database of constantly changing data like a shopping cart with inventory control, then you're going to use the database search and not worry about it. But when you have large amounts of predominantly static information, it can be faster to have a single index to search, even if that index is quite large.

Another way to do it is to index based on a number of items, so that you have a title index, an author index, a keyword index, et cetera and then offer a database search as the advanced search. Most people will use the quick index searches and get the results they need, others with quirkier queries may choose to search the database itself.

It really, truly, doesn't matter what server-side language you use for a project of this nature. You can use XML databases as easily as SQL databases as easily as ODBC databases because of the nature of the data.

Everything depends on the data/content, and all decisions should be based on what will serve that to the users the best way, not what will be a personal challenge to the developer, nor any other Ego Stroking Criterion.

They have: 14 posts

Joined: Dec 2003

Suzanne wrote: If you index the articles, and search the index, you are searching one document. It is much faster and reduces wear and tear on your server.

I see...but how do you index? how do i tell it to retrieve a single record (article) when they're all in the same document?

Suzanne wrote: all decisions should be based on what will serve that to the users the best way, not what will be a personal challenge to the developer, nor any other Ego Stroking Criterion.

exactly. as i said before, i want to do it correctly, not quickly, and if learning something new is necessary to do the job right, i'm open to the challenge. is that egotistical? would any of us be here were we not driven by curiosity?

-Matt
Iraq Media Developments Newsletter...and telecom stuff too.

druagord's picture

He has: 335 posts

Joined: May 2003

Quote: curiosity, challenge of learning something new, something else to add to my resume...and i want to do it right, not fast.

you are right curiosity challenge are good motivation and i was thinking like you for the my first 2 years as a programmer but hten found out that i had a little Knowledge about many languages. now i decided to dedicate myself to php and i have lot of knowledge in it i allows me to do the job right faster Smiling wich is the way people paying for the job want it.

And programming is always a challenge to me.

IF , ELSE , WHILE isn't that what life is all about

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

Curiosity is different from racking up resume skills. Wink

Non-database meaning of index:

Indexing is usually a separate file that contains the specifics, i.e. unique file number, title, authors, set keywords, et cetera. You set up the application to search that first (don't forget to reindex when you add new articles), and set the extended or advanced search to search the actual database. In some cases it's another table in the database, and in others it's a flat file that's parsed.

Another way to "index" is to pull out specific hyperlinked queries and let the user wander through them as they are saved as regular HTML pages. Recreate those pages when new content is added. Popular content management systems do this. The search would then search those "static" pages and link to the dynamic documents. You could include common searches as well as an option to help people find what they want.

For databases, indices are ways they speed up search times, and are set up at the database level. If you know that you will have specific searches mostly, you would set those columns as the indices in a table.

For a better explanation, see http://www.mysql.com/doc/en/MySQL_indexes.html and actually that whole section on how to optimize your database for optimal performance.

I believe that any relational database will give you the results you need. In order to satisfy your curiosity, research your options (and their related costs) and proceed from there to choose one. Open-source databases typically have more online support available and are less pricey.

They have: 14 posts

Joined: Dec 2003

i wish i had discovered this place before finishing my most recent site. much more helpful than usenet, and without the attitude.

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

hey now, I work hard on my attitude! don't be bursting my bubble! Wink

I do hope this is of help to you. There are many very solid contributors here and we're always happy to have more.

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.