database design for ad server
So i am building an ad server and need a little guidance with the database.
As you can imagine, everytime a ad is called or a click is clicked a new row in a database is created to record the impression and click.
Obviously over time when you have delivered millions of impressions and clicks this will build up the database to an enormous rate.
So ad server sort the data say every 15 minutes - 1 hour by couting the amount of clicks and impressions it has recorded in that 15 minutes or hour and storing it as 1 row in another table, then purging the raw data tables to clear it all out.
My question is, even if you stored 1 row per hour per banner and you have 100 banners thats 100 rows per hour; per day that is 24x100 = 2400 rows x 7 days = 16800 rows x 52 weeks (1 year) = 873600 rows per year for 100 banners.
So you can imagine if you have 10,000 banners running?
Anyway how can I store the data without building up an enormous database that effectively will be slow when querying it?
I will be using MySQL. I suppose MySQl has to have a considerable amount of rows in a table or database before it is running slow? say 10 million rows = 11 years based on 100 banners.
If anyone has any ideas please let me know - I have already asked OpenX community but they really are quite bad at answering back in the forums.
teammatt3 posted this at 05:52 — 6th December 2009.
He has: 2,102 posts
Joined: Sep 2003
Is your ad server going to be serving 10,000 banners at a time? Be realistic. Everyone thinks their application is going to take off, and few ever do.
873,600 rows isn't a lot. Most of your data is numeric, and won't take up a lot of space. It can also be sorted and retrieved faster than textual data. 8 million records isn't bad at all either.
The data probably doesn't need to be stored for more than a year. Does a person really need to to see the IP address of someone who clicked a banner ad 13 months ago? Probably not.
I would highly recommend starting off with a normalized database, and if scalability becomes a problem, buy better hardware. And if the hardware doesn't fix the problem, then start caching, purging, and denormalizing your database.
benf posted this at 13:41 — 6th December 2009.
They have: 426 posts
Joined: Feb 2005
Hello teammatt3.
Yes you are right. I was not aware of denormalization looks interesting. But This is also what I thought to only keep the data in the database for no more than 2 years. It is better to keep the statistics from the year before so you can see trends in banner delivery. But only keeping the past 2 years keeps the volume down.
Thanks for the advice and if you can give me any more advice would be appreciated.
Good Value Professional VPS Hosting
Shaggy posted this at 15:14 — 7th December 2009.
They have: 121 posts
Joined: Dec 2008
Why would you ever throw data away? (Says the data packrat)
Normalisation will help with your storage size. A well placed index or three will keep you collecting data for a long, long time before more drastic measures need to be taken. Hundreds of millions of rows aren't out of reach on basic hardware.
If / when data volume become a problem, partition it off. Move it off to a different tablespace, or even a different database. There will come a day you'll wish you'd kept it.
Cheers,
Shaggy.
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.