which language for archiving articles?
tried posting this on the web database forum with no luck...
I have a bunch of newspaper articles that I'd like to put online as a searchable archive. The only search parameter I think I'll need (in addition to full text) is a category field that is populated by myself. That is, people can search the database for certain words or phrases that exist within the article text; or, they can choose from a set list of topics and are given every article that fits that topic. I determine which topics are appropriate for each article, and an article can have more than one topic. What type of database should I use for this? I have experience only with PHP, but I'm eager to explore other ways of doing this.
thanks,
Matt
-Matt
Iraq Media Developments Newsletter...and telecom stuff too.
Suzanne posted this at 03:27 — 11th December 2003.
She has: 5,507 posts
Joined: Feb 2000
The type of database isn't relevant, I think that's your problem. All databases will handle this with equal measure as long at they are set for multiple users (i.e. not desktop single user databases like Access).
MySQL is popular as an open source robust database, and you can also explore proprietary databases such as SQLserver (MS) and Oracle.
You'd probably be better set to index the articles and search the index instead of the database all the time.
msburton posted this at 12:21 — 11th December 2003.
They have: 14 posts
Joined: Dec 2003
interesting...how do you mean?
-Matt
Iraq Media Developments Newsletter...and telecom stuff too.
druagord posted this at 15:11 — 11th December 2003.
He has: 335 posts
Joined: May 2003
why would you want something else the faster route to somewhere is the route that you know. IMO php is faster for devellopement then most other server side language.
IF , ELSE , WHILE isn't that what life is all about
msburton posted this at 15:59 — 11th December 2003.
They have: 14 posts
Joined: Dec 2003
curiosity, challenge of learning something new, something else to add to my resume...and i want to do it right, not fast.
-Matt
Iraq Media Developments Newsletter...and telecom stuff too.
Suzanne posted this at 16:07 — 11th December 2003.
She has: 5,507 posts
Joined: Feb 2000
If you index the articles, and search the index, you are searching one document. It is much faster and reduces wear and tear on your server.
Here's the thing -- if you have a small database, it's not really going to matter, if you have a database of constantly changing data like a shopping cart with inventory control, then you're going to use the database search and not worry about it. But when you have large amounts of predominantly static information, it can be faster to have a single index to search, even if that index is quite large.
Another way to do it is to index based on a number of items, so that you have a title index, an author index, a keyword index, et cetera and then offer a database search as the advanced search. Most people will use the quick index searches and get the results they need, others with quirkier queries may choose to search the database itself.
It really, truly, doesn't matter what server-side language you use for a project of this nature. You can use XML databases as easily as SQL databases as easily as ODBC databases because of the nature of the data.
Everything depends on the data/content, and all decisions should be based on what will serve that to the users the best way, not what will be a personal challenge to the developer, nor any other Ego Stroking Criterion.
msburton posted this at 22:24 — 11th December 2003.
They have: 14 posts
Joined: Dec 2003
I see...but how do you index? how do i tell it to retrieve a single record (article) when they're all in the same document?
exactly. as i said before, i want to do it correctly, not quickly, and if learning something new is necessary to do the job right, i'm open to the challenge. is that egotistical? would any of us be here were we not driven by curiosity?
-Matt
Iraq Media Developments Newsletter...and telecom stuff too.
druagord posted this at 16:16 — 11th December 2003.
He has: 335 posts
Joined: May 2003
you are right curiosity challenge are good motivation and i was thinking like you for the my first 2 years as a programmer but hten found out that i had a little Knowledge about many languages. now i decided to dedicate myself to php and i have lot of knowledge in it i allows me to do the job right faster wich is the way people paying for the job want it.
And programming is always a challenge to me.
IF , ELSE , WHILE isn't that what life is all about
Suzanne posted this at 23:53 — 11th December 2003.
She has: 5,507 posts
Joined: Feb 2000
Curiosity is different from racking up resume skills.
Non-database meaning of index:
Indexing is usually a separate file that contains the specifics, i.e. unique file number, title, authors, set keywords, et cetera. You set up the application to search that first (don't forget to reindex when you add new articles), and set the extended or advanced search to search the actual database. In some cases it's another table in the database, and in others it's a flat file that's parsed.
Another way to "index" is to pull out specific hyperlinked queries and let the user wander through them as they are saved as regular HTML pages. Recreate those pages when new content is added. Popular content management systems do this. The search would then search those "static" pages and link to the dynamic documents. You could include common searches as well as an option to help people find what they want.
For databases, indices are ways they speed up search times, and are set up at the database level. If you know that you will have specific searches mostly, you would set those columns as the indices in a table.
For a better explanation, see http://www.mysql.com/doc/en/MySQL_indexes.html and actually that whole section on how to optimize your database for optimal performance.
I believe that any relational database will give you the results you need. In order to satisfy your curiosity, research your options (and their related costs) and proceed from there to choose one. Open-source databases typically have more online support available and are less pricey.
msburton posted this at 02:06 — 12th December 2003.
They have: 14 posts
Joined: Dec 2003
i wish i had discovered this place before finishing my most recent site. much more helpful than usenet, and without the attitude.
Suzanne posted this at 03:30 — 12th December 2003.
She has: 5,507 posts
Joined: Feb 2000
hey now, I work hard on my attitude! don't be bursting my bubble!
I do hope this is of help to you. There are many very solid contributors here and we're always happy to have more.
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.