Looking for a simpler solution

Greg K's picture

He has: 2,145 posts

Joined: Nov 2003

We have a project here that we are working on, that has a report summary that we are trying to determine the best way to get the reports, based on speed and sever load. Here is the scenario:

A client can go in and adjust a value used in a calculation, say he goes in several times to update it:

Date:       Value Set:
==========  ==========
April 26     4,000.00
April 29     5,000.00
May 10       9,000.00
May 13       5,000.00
May 21       6,000.00
May 30       5,500.00
June 3       7,000.00
June 11      5,000.00
'
Now, for the reports, we need to find the average value for the month of May. The prototype I'm building off of went and took all the average for all entries for that month ((9,000 + 5,000 + 6,000 + 5,500)/4) = 6,375.

However, these values are for daily activity of a business, so this method doesn't take into consideration that the spike of 9,000 only lasted 3 days and the initial value at the begining of the month (5,000 set back on April 29th). To get proper value we are looking for, we would need to do:

(1st - 9th):   5,000 * 9 = 45,000
(10th - 12th): 9,000 * 3 = 27,000
(13th - 20th): 5,000 * 9 = 45,000
(21st - 29th): 6,000 * 9 = 54,000
(30th - 31st): 5,500 * 2 = 11,000
                          =======
                          182,000  / 31 (total days in May) = 5870.97
'Here we get the correct average for the month, about $500 less.

Now, I'm seeing the easiest way program wise to add this to the program is, whenever you add a record for a day, it goes and adds a value for each day since the last insert (i.e.. on May 21st, it would loop through 8 times and insert a record for the 14th - 20th (with the value from the 13th) and then the new value of on the 21st. Then we can call the database and just get an average where month = May.

However, we are wondering if there is a simpler method to do this where is calculates in a value for missing days as well. Additionally we are concerned over if we did do our own function to do the caclulate missing values between the entry days, would the speed of such a function be considerable. (Keep in mind the reports are eventually going to be able to span over several months, and hopefully if things go well, years).

Thank you in advance for any suggestions on making this problem more simplified and optimized.

-Greg

PS. The main downfall I see to my easiest solution is, there may be clients who may not do any updates for a few months, and for 3 months, that would be running about 90 INSERT statements...)

He has: 51 posts

Joined: Jan 2004

It is much easier to do inserts and let the DB do the calcs, but I do not think the speed difference between your function and the DB calc will be too different. However, if you say the value may not change for weeks or months, then your function would prolly be more efficient, since you'll be taking shortcuts the DB cannot.

If development time is more precious right now, then let the DB do the calcs and create an optimized function later in the life of the project.... "hopefully if things go well"

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

I would advise use of a history table...

+----+--------+-------------+--------------+-----------------+
| id | client | change_date | change_value | change_duration |
+----+--------+-------------+--------------+-----------------+
|  1 |    123 | 2004-05-01  |         5000 |               4 |
|  2 |    123 | 2004-05-05  |         6500 |               7 |
|  3 |    123 | 2004-05-12  |         7000 |               0 |
+----+--------+-------------+--------------+-----------------+
'If change_duration is zero, then change_value is the currently effective value. When the user changes the value, run an update to calculate the duration for the last value and update the previous record. Then insert a new row with the new value and the days date.

Using PHP (or ASP), query the table for dates required, then perform your math:
change_value * change_duration / SUM(change_duration)

Mark Hensler
If there is no answer on Google, then there is no question.

Greg K's picture

He has: 2,145 posts

Joined: Nov 2003

Mark, I like your idea, would save on the tons of "inserts" for each day they didn't save anything in my approach. The only trick still remaining is then finding it for just one month, and say they didn't do an update from 4-23 until 5-3.

Maybe when it has an update, just do an insert of a record for the begining of each month?

-Greg

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

Unless there is a way to calculate the number of days between April 23 and the end of the month. I don't know of a way off the top of my head (in sql).

Inserting a record for the begining (and maybe end) of the month may be the easiest solution.

Mark Hensler
If there is no answer on Google, then there is no question.

He has: 51 posts

Joined: Jan 2004

Yes, Mark's idea is good. Also, to calculate the number of days between a given day and the end of the month in mySQL you can use what's in mySQL Cookbook in Recipe 5.22

SELECT d,
DAYOFMONTH(DATE_SUB(
DATE_ADD(DATE_SUB(d,INTERVAL DAYOFMONTH(d)-1 DAY), INTERVAL 1 MONTH),
INTERVAL 1 DAY))
AS 'days in month'
FROM date_val;
'

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.