XML -- various

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

I have a client that has thousands of documents in various lay out programs (Ventura, Quark, Pagemaker) as well as HTML and .pdf and they want to move all their thousands of documents to XML.

Okay, well that's great! But we need to come up with a standard and streamlined way to move ALL the documents to XML. Yes we will need to really have 6 ways at first, but that's okay.

What I need to know is if anyone has had experience with this, and if so, if they could please explain to me the value of a database in relation to XML documents (that need to be searchable in a variety of ways), or whether having a flat file index would be just as good (and save $$ on the database), or if the database has benefits I am not aware of (not even remotely unlikely!).

I know this is kind of huge, but if anyone with any experience with this who can give me an overview or point me to the right resources is out there, please let me know!

Thanks, ever so much!

Smiling Suzanne

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Suzanne,

As far as replacing a database backend with XML, it was never meant for that. It's better than using a flat text file but for any serious web application you'll still need some sort of database backend.

I use XML/XSL for creating data islands, that is, I will pull the data from the database and store it in XML thus allowing the client to manipulate it without any round trips back to the server. It's also great for storing fairly static data, thus taking some load off the db server.

If you're client is already at thousands of documents I would recommend using a proper database backend. Storing that many records in a flatfile or XML files could become hard to manage, as well as being unsecure.

To see some examples try going to http://www.xml101.com , there's some good examples there.

Hope that helps a little Smiling

PJ | Are we there yet?
pjboettcher.com

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

Specifically the advantage to having the documents in a database over say just in a folder system and crawling the folders regularly and building an index?

(I'm not advocating any method at the moment, I am trying to get my head around how the database would function and why, et cetera).

Also, and do forgive me, but I am not a programmer, I work predominantly in the middle between front end and back end and the programmers I work with are unfamiliar with XML (but are on the same road to knowledge!) and we want to get this right right out of the gate.

These are whole documents, articles, press releases, et cetera. They may be pulled into a larger surrounding "interface" or they may be standalone, that hasn't been decided.

Could you tell me (or point me further) to the advantages and functions of using a database for whole documents? I know it's really a newbie question, but in this case I really am a newbie. I have only used databases for inventory type work, where I have products, prices, sales, inventory levels, yada yada, or things like a database of customer transactions, where I have name, address, et cetera.

I am thinking this through at the moment -- putting the documents into the database would be one thing, then you can pull them into XML? I think I see. Could you just clarify a bit for me?

Much appreciation, you are a bit of a white knight for me lately!

[smile] Suzanne

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

I think I misunderstood your original question. Storing documents directly inside the database is not very efficient from the database perspective. Databases are good at storing and retrieving regular text (chars,integers,etc) but when it has to retrieve a whole document from a field it requires more processor time. It makes more sense to store the document in the file system and only store the document's link in the database.

XML won't help you with storing the actual documents either, only the specifics like title, link, etc.

From the sounds of your application the best way would be to develop a form that let's user's upload the document to the server and save all the document details to the database.

I have developed a similar application for my companies Intranet. I created a form that asked for the document title, department, and then I had a file field that would let them select a file from their computer/network and upload it to the server.

This creates an organized document index that can be searched and updated very easily.

I don't think using XML would make sense for this application unless using a real database is not an option.

PJ | Are we there yet?
pjboettcher.com

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

I just sat down and tortured my poor programmer and hashed out what we are doing to some extent. The whole documents will be in a relational database, each field being an XML tag. The XML part is non-negotiable. There won't be entire documents in any one field, I was just simply not getting it! But I *so* get it now and I am so excited I can't contain myself.

I understand the database part now, and how it will work, and adding a system for the user to add new documents is part of the process (for the smaller documents -- some are books, so it will be Quark > XML > database > XML > web application as a process?)

What we will likely be doing is combining things with JSP for regularly viewed articles/pages/chapters, yada yada, but I just have to say THANKYOU! for helping me get past this impasse I was having over the database issue.

We won't be allowing any file uploads at this stage, though it will be for the future, a complete XML file will need to be sent through some sort of parsing process so it can be sliced into the bits for the database. First steps first, though.

Thanks, Peter, you're my hero.

[smile] Suzanne

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.