Number of Tables in a Database

They have: 141 posts

Joined: Aug 1999

I was wondering if it hurt performance to have a large amount of talbes in a dbase. I am creating a forum software that will use threaded posts. Do you think making a seperate table for each thread would bog down the database? Should I go with one large table and link everything on that. Any ideas?

Thanks

------------------
Visit the dLo.Network
http://dlo.net

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Randall,

I wouldn't make one large table, as that table gets larger your queries would get slower and slower.

Your idea of making a separate table for every thread sounds like a good one to me, this would be much more efficient and will not hurt the performance of your database.

Regards,
Peter J. Boettcher

PJ | Are we there yet?
pjboettcher.com

They have: 31 posts

Joined: Dec 1999

Well...
I am in a process of making a forum using a database two.
I am using Access...and my idea was to have a table for the newest 100 messages (or the messages of the past 30 days or something) and a table for the old messages.
I don't actually like this idea 100%
However the idea of making a table for each thread is kind og weird. And I have two questions:

1. If you have 1000 threads, are you gona have 1000 tables?
2. If you want to make the list with the first message title of every thread, how are you gonna do this?

Coddy

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Coddy,

1000 tables is nothing in SQL. I've worked on relational databases containing close to 30,000,000 records in thousands of tables.

Obviously you might want to do some tracking and setup a trigger to delete tables that are a year old or something.

As far as creating a new table it's fairly easy:

CREATE TABLE tablename
(ThreadID int PRIMARY KEY, ThreadTopic char(50), ThreadComment text)

You would assign some autonumber to tablename, and add whatever fields you'd want.

Regards,
Peter J. Boettcher

PJ | Are we there yet?
pjboettcher.com

Vincent Puglia's picture

They have: 634 posts

Joined: Dec 1999

Hi,

[eager to jump in]
Peter's right.
Consider: each table will have 3 columns and at most 50 rows. So, select * from TableX should be almost instantaneous.
The alternative would be to have 4 or more columns (to separate each thread within the table) and umpteen rows. All of which means you would have to search 2 or more indices to find the thread.
You might even consider adding an additional table hooked to the rest -- a sort of thread index -- so that user's could search for a phrase and get all of the appropriate rows.

Vinny

------------------
GrassBlade: cut&paste javascript

Where the world once stood
the blades of grass cut me still

They have: 31 posts

Joined: Dec 1999

Peter,
I was talking about Access. I don't think that access can handle so many tables...
Anyway...

Coddy

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Coddy,

Access can handle 1000 tables or more, it just won't be as fast as a dedicated solution like SQL.

Regards,
Peter J. Boettcher

They have: 103 posts

Joined: Apr 1999

I never thought of having a table for each thread, but that sounds like a pretty good ideas to me. But how easy is it to create a new table in MySQL with a PHP script? There's always the exec() command, but I'd like to see an example of what you were thinking, Randall.

------------------
Gil Hildebrand, Jr.
Senior Web Developer, 4atcost.com

Gil Hildebrand, Jr.
Internet Consultant
New Orleans, LA

They have: 30 posts

Joined: Apr 2000

The new forum I created uses a seperate table for each thread. Create a new table in mysql just involves sending a CREAT TABLE tablename... query to the mysql_query function. I decided to go this rought and have another table just to index each thread and divide them into subcatagories. However my current forum only has one catagory and has a few bugs in the script.

------------------
Open Source is the future!

They have: 32 posts

Joined: Mar 2000

I'm not sure why everyone seems to think you need a new table for each thread -- this sounds like a very bad idea.

If you index your table properly, all you will need is one table to contain all your threads.

Picture this as a table structure:

Thread ID
Message ID
Message Date
Author
Subject
Message Text

If you have indexed the thread ID, then you can do a query WHERE the thread ID equals the one you want. This will not have to search the entire database; since it is an indexed query it will be extremely fast.

Your queries will not get "slower and slower" as the table grows -- they may get slightly slower, but "slower and slower" suggests that if you double the number of records in the table your query time will double too -- perhaps they increase from 0.05 seconds to 0.07 seconds to find a thread, but that is not "slower and slower".

You may want to store your thread subjects in a separate table so that you can present an "index of threads". That will be faster than doing a "select distinct" on the table to get a list of threads.

Consider this: if you have a separate table for each thread, how will you keep track of all these tables? How can you search for a phrase in one of the messages?

I think the "table for each thread" option is a very bad design -- it treats SQL tables as though they are pure flat files, which they are not. Doing indexed record retrieval is the fastest way to get data from a database.

To lend myself some credibility, I run the site hockeydb.com. My statistics database contains over 200,000 records in it. To select all the stats from any given player, going by his player ID, mySQL reports that it takes 0.00 seconds -- not a typo, it simply took less time than mySQL has precision. The retrieval is instantaneous because it is an indexed retrieval.

If I had one table for each of my 45,000 players, I would have to first figure out the name of my table (I could make it easy by naming the table the same as the "player id") and then select * from that table with no WHERE clause

It might take 1/1000 of a second less time to do that select, but let's say that I want to see all of the players that played on a certain team -- how would I get this information since it is stored across 45,000 different tables?

Ralph Slate
http://www.hockeydb.com

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Ralph,

It seems that you don't have a lot of experience with relational databases. 200,000 records is nothing, we're talking millions here. I think that you could design a forum database both ways, but between the two, the most efficient will ALWAYS be the relational database. Granted it might be harder to develop up front but it will be a better product in the end.

Keeping track of the tables would be extremly simple, as has already been suggested you could setup an index table, that holds all this information. Searching would also be simple using this same table to get the table names and then search.

And your queries do get slower and slower, I'm not saying they double in length but try doing a query on 5 records and then try 500, you WILL notice quite a difference, not only on query speed but if you haven't setup database paging then you'll also take a hit on page display speed.

I'm glad your forum works well for you, and if you're happy with it that's great. I'm just stating what I feel would be the most efficient way to design a database based forum.

Regards,
Peter J. Boettcher

PJ | Are we there yet?
pjboettcher.com

They have: 32 posts

Joined: Mar 2000

It was very nice of you to start your post out with an attack on me and my alleged lack of experience (which isn't true -- I've been working with databases for over 10 years, and even longer if you count the simple text database I created on my Atari 800).

I have in front of me a relational database with 14,000,000 records in it. By selecting based on the primary key, I can return a single row back in 1.3 seconds. Internet latency is longer than that, so 1.3 seconds vs 0.3 seconds is not even noticeable.

If you actually read my message you'd realize that:

1) I have no forum under development (so when you say "I'm glad your forum works well for you" it makes no sense).

2) I'm not advocating a non-relational solution to this problem.

When you say "200,000 records is nothing, we're talking millions here", I didn't see that in the original question, even though it doesn't really matter.

Also, when you say "Searching would also be simple using this same table to get the table names and then search.", that would not be so simple; in order to search all threads, you'd have to formulate 1 query per table in your database -- in other words, if you have 1000 threads, to search for the word "hockey" in a message (which is part of a thread) you would have to formulate and execute 1000 separate queries. That would be horribly slower than searching a single table -- even if you do have to do a table scan, its better than doing a table scan on 1000 separate tables.

I completely disagree with your point and design that having multiple tables for each thread is the way to go. It defies relational design philosophy -- you've created a "sixth normal form" here.

Instead of keeping similar data in the same table, you're creating separate tables based on one of the columns in the table (which you're now using as the table name). Breaking your data up doesn't achieve anything here -- the only reason I'd ever recommend that would be for security purposes, or maybe to make it easier to archive large volumes of data by keeping them in different tables/tablespaces.

Indexing is the key. As I've shown above, an indexed search is instantaneous if the indexes are built properly. It doesn't matter if you have 100 rows or 10,000,000 rows. And for very large databases, partitioning them is the only way to go.

Advocating the creation of a separate table for each thread in a message forum is a very bad design. If you have, as you suggest, millions of records in your forum, then you will probably have hundreds of thousands of tables. That is a ton of extra overhead -- storage-wise, creating a new table per thread probably means 40k or so extra per table. Multiply by 500,000 and you have 20 gigs of wasted space.

Again, with proper indexing there is no need to partition your tables so thinly. How many messages would each table contain? 10-20? Maybe 100? In rare cases 1000? But on average 1-10, most likely. If you use thread id/message id as a primary key, and create a separate index table to store the thread descriptions, and index your message date, you will see better performance than having tens of thousands of tables with exactly the same structure.

Ralph
http://www.hockeydb.com

They have: 5,633 posts

Joined: Jan 1970

I've been lurking on this thread, and all I have to say is that any database design that partitions DATA by using database OBJECTS (i.e. tables) totally flies in the face of relational database design and theory. I'm not saying it won't work, only that it is a horrible idea. Anyone that disagrees with what Ralph is saying should go buy a good book on relational database design and read it abouy 10 times until you understand it. I get paid ALOT of money to design HUGE databases (not millions or tens-of-millions of records, but hundreds-of-millions of records) everyday. If you don't understand indexed retrievals as Ralph has described TWICE, than you shouldn't be designing databases at all. And as far as database performance increasing with table size, that is obviously true for full table scans, but it is totally WRONG for any field that is properly indexed. You can have a table with half a billion records in it, and if it is built properly, a fully indexed retrevial can take milliseconds. If you don't understand how that is, that please re-read my first suggestion.

And for people who are too lazy to read a book, here's a quick SQL lesson for you: DDL (Data DEFINITION Language) is used to DEFINE (CREATE, DROP) objects in the database - tables, indexes, views, etc. DML (Data MANIPULATION Language), as you can guess, is used to MANIPULATE data in the relational database (SELECT, INSERT, UPDATE, DELETE). Any design that used DDL for DML purposes is just plain bad design, by ANY standard.
There is a HUGE amount of overhead in relational database servers to DEFINE objects. DB Servers are optimized to MANIPULATE data and indexes. I'm also not saying that DB servers can't handle thousands of tables - they obviously can, if you need to store thousands of DIFFERENT datasets.

Think of it this way: A good database design is one where you DEFINE the structure/schema once, and then leave it alone. ALL database activity is then done using DML.

-Hank

They have: 31 posts

Joined: Dec 1999

How can someone index for exapmle the 'thread ID' , using an Access database?
Does the Primary Key do the job?

Coddy

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Guys,

I don't want to get into a chest thumping match, it's pointless. I will justify myself though, I worked on one of the largest database/asp projects in Canada (ClubZ member site) they're currently over 24,000,000 registered users doing millions of transactions per month, let's just say it's a VERY large database.

I will apologize for my statement to Ralph, it was a little harsh, sorry.

I'm glad some of you experts are finally coming out of the woodwork, what took you so long?

Hanke,

I'm a little puzzled by your statement of using DDL for DML purposes as bad design? What exactly are you saying? Creating a table on the fly that is automatically indexed seems like quite a good and powerful design and it would seem a little bit strange to ignore using this power.

I agree a good schema shouldn't need to be redesigned but I found some DBA's get so hung up on this fact that they don't accept change to well. Sometimes a more dynamic approach to database design can achieve quite amazing results.

I will still stand by my original opinion. I still believe it's the best design, without the need of indexing, and I still believe it to be more flexible.

Anyway I've got to get back to work. Thanks for all the comments.

Regards,
Peter J. Boettcher

PJ | Are we there yet?
pjboettcher.com

They have: 32 posts

Joined: Mar 2000

I think the basic point behind HankE's comments is that databases are optimized to insert data into columns (DML), so such an operation would be very fast. They are not optimized to create tables (DDL), so they do these things more slowly because most databases change very infrequently.

If you go with the "one table per thread" design, here is what you would have to do to add records to the database:

1) Check to see if it is a new thread. Maybe this would be done automatically by someone pressing "new", I don't know.

2) Get an ID for the thread. You'd have to keep this counter in a separate table; you couldn't take advantage of things like IDENTITY columns or SEQUENCES.

3) Issue DDL saying "create table..." naming the table some scheme based on the thread ID.

4) Insert the records into the table you just created.

When retrieving data from this design, you'd have to:

1) Select the data from the table that has the name derived from your thread ID. The proposed theory is that this would somehow be faster than keeping the data in one table because there are fewer records in the table. I guess there is also a proposal on the table that calls for no indexes (although I don't know why having indexes is necessarily a bad thing) -- in fact, if each table only has less than 100 records, indexing would probably make no difference -- a tablescan would be fast.

Now suppose that the assumption that it takes longer to access data from a table that has 1,000,000 records in it is false. You now have a very complex database (hundreds of thousands of tables) and no other benefit -- the only benefit to designing it that way was for this supposed performance gain. You also have a very complicated scheme to add data to your table. And what happens if someone says "we want to start storing information about a user's birthdate -- if you didn't create that field on your table template, you'll have to add columns to hundreds of thousands of tables to make them all standard. I wouldn't want to have to do that.

The point I am trying to make here is that retrieving data from a table that is properly indexed is effortless. You haven't proven that retrieving data from such a table is bad, performance-wise. Yet it's the crux of your design.

You mention that you worked on ClubZ. Did you keep every user in your user database in a separate table? Extending your argument logically, it would have been very fast to retrieve a record from a table that only has one record. So why didn't you design it that way?

If you kept all your users in one table, did you have performance problems when trying to retrieve data about one of them? If you did, did the table have a primary key?

If you created one table for each user, how would you have handled it if someone said to you "give me a list of all the users in Ontario"? You'd have to get a record from each table to see if that record had Ontario as the province. It would take a very long time to go through hundreds of thousands of tables.

If you kept them all in one table, it would be a simple query -- a single tablescan if the column was not indexed, and an index retrieval if it was.

Answering the previous question "How can someone index for exapmle the 'thread ID' , using an Access database? Does the Primary Key do the job?", the primary key should define a unique record in the table. It can either be a natural primary key (such as date/user) or it can be an artificial primary key (which would be a sequential number). There are appropriate times for either, although using an artificial primary key coupled with a unique index on the natural primary key is a pretty good design because it makes refering to a record very easy yet still preserves the uniqueness of your natural record set.

To add a simple index in Access, open the table in design view, go the "view" menu, and select "indexes". You can then create indexes at will.

Ralph
http://www.hockeydb.com

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Ralph,

You make a valid point about adding new fields to a table. This would be a pain with my design, although not impossible if you numbered your tables logically you could automate it, but I have to admit it would be a pain, and not efficient. Although for a forum I think you would have must of the field names up front.

Regarding ClubZ, the customers were not stored in separate tables, but the data isn't used for a forum so it's not a good comparison. Stuff like province and city would be linked to a separate table, to conserve space.

I agree that a table that is properly indexed will be effiecient, I missed that in your first post. How often do you update your indexes?

Regards,
Peter J. Boettcher

PJ | Are we there yet?
pjboettcher.com

They have: 32 posts

Joined: Mar 2000

I'm not sure we're talking about the same thing. When I say "indexes", I mean the physical database index that is updated always -- I don't mean a separate "index table" that might have things like thread subjects in it.

Physical indexes are always maintained by the database, and require no manual updates. You may want to rebuild them from time to time to make sure they are stored efficiently -- physical order matters in most databases.

The only caveat on indexing is that 1) indexes take disk space, and 2) they slow down insert operations because an insert needs to not only insert in the table, but must also insert into the indexes.

Ralph
http://www.hockeydb.com

They have: 30 posts

Joined: Apr 2000

Is it necessarry to specify queried columns in a table? Like say there is a feild named ID, as the Primary Key. How about secondary keys, do you define those when creating a database?

------------------
Open Source is the future!

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Patrick,

I'm not sure what you mean. Do you mean if you have to declare secondary keys when you make your statement?

Sorry, but I'm really not sure what you mean.

Regards,
Peter J. Boettcher

PJ | Are we there yet?
pjboettcher.com

They have: 30 posts

Joined: Apr 2000

I know you don't have to specify secondary keys, but is there a way to make columns you query often resident in memory, or something to that effect?

------------------
Open Source is the future!

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Patrick,

Three letters, XML !

What you describe is one of the benefits of XML/XSL. Pulling selected records out of a database, holding them in XML and letting the user manipulate that data without going back to the server.

You could also do this by storing the records in a client side (JavaScript) array, by using ASP to write the values into the JavaScript.

Unfortunately, both these methods require at least an intermediate level of programming knowledge, and both would be quite a project to do, and currently XML support by the browsers isn't there yet.

Acutally, using ASP you could probably store something using the application object. Since this object is shared by all users on your website this could work. I'll have to think on this one!

Regards,
Peter J. Boettcher

PJ | Are we there yet?
pjboettcher.com

They have: 30 posts

Joined: Apr 2000

No, I mean on the server side, the the column is in the servers memory and can be parsed faster

------------------
OMG It's Furious George!!!

They have: 453 posts

Joined: Jan 1999

Hi,

I've been gone for a while, so fogive me for my late answer.

The way to go is letting the database to what it was designed for and you doing your part of the job.

Some time ago I worked on an internal database that hat over half a billion entries in one table and the response time was much lower than one second.

What do you think you will gain by splitting your tables ?
A good database does this (based on your keys) anyway. And it does that much faster than you can ever do it in Perl/PHP/Rexx/etc.

Just save yourself the time (and headache) and use it to read a good database modeller book. No, not "SQL for Dummies" or "Access in 21 Days". Get something real.

just my $.02

ciao
Anti
ps:
Everybody seems to be writing his own (database driven) board software, but boardzilla seems to be starving ... just wondering...

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Patrick,

I guess the only thing that might speed things up a little bit would be to have a stored procedure that would be called by a trigger or rule.

Regards,
Peter J. Boettcher

PJ | Are we there yet?
pjboettcher.com

They have: 141 posts

Joined: Aug 1999

Hey, I have finished my mysql driven forum. I am still adding features though. Check it out My forum

------------------
Visit the dLo.Network
http://dlo.net

Peter J. Boettcher's picture

They have: 812 posts

Joined: Feb 2000

Randall,

Great work!

Regards,
Peter J. Boettcher

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.