MySQL Group By Seven Days

If we have a daily expense table like

+------+------------+
| cost | date       |
+------+------------+
|   44 | 2010-07-30 |
|   39 | 2010-07-31 |
|   18 | 2010-08-01 |
|   71 | 2010-08-02 |
|   69 | 2010-08-03 |
|   62 | 2010-08-04 |
|   89 | 2010-08-05 |
|   72 | 2010-08-06 |
|   46 | 2010-08-07 |
|   23 | 2010-08-08 |
+------+------------+

its very easy to show monthly aggregation. Just a group by clause on date column based on month will do the job

SELECT	SUM(cost) AS total, MONTHNAME(date) AS month 
FROM daily_expense 
GROUP BY month 
ORDER BY MONTH(date);
+-------+-----------+
| total | month		|
+-------+-----------+
|    83 | July      |
|  1641 | August    |
|   554 | September |
+-------+-----------+

Now what we do when its been asked to show weekly aggregation. Using MySQL WEEK function returns the number of the week number of the date. By default the start of the week is Sunday or Monday based on the Server System Variables. Otherwise it can be set while using the function itself. So the query

SELECT	SUM(cost) AS total, CONCAT(date, ' - ', date + INTERVAL 6 DAY) AS week
FROM daily_expense
GROUP BY WEEK(date)
ORDER BY WEEK(date)

gives something like

+-------+-------------------------+
| total | week                    |
+-------+-------------------------+
|    83 | 2010-07-31 - 2010-08-06 |
|   427 | 2010-08-07 - 2010-08-13 |
|   345 | 2010-08-14 - 2010-08-20 |
|   389 | 2010-08-21 - 2010-08-27 |
|   318 | 2010-08-28 - 2010-09-03 |
|   404 | 2010-09-04 - 2010-09-10 |
|   272 | 2010-09-11 - 2010-09-17 |
|    40 | 2010-09-13 - 2010-09-19 |
+-------+-------------------------+

Till now its fairly straightforward and simple. Unfortunately it does not remain so when the same weekly aggregation has to be done but on different start day of the week, other than Sunday or Monday. Is there any option in MySQL to specify different start day of a week? I couldn’t find it, but if anybody knows anything like this please let me know by putting a comment here.
But I couldn’t wait till someone finds a easier way & let me know how. I have my own solution.

The thought
To get any aggregated data from a table we have to use the Group By clause on that field. In our case the field is the day name of a day of a week, like Wednesday, lets say in our case. My plan is to do some kind of operation on all the day up to next Wednesday. This function will change those next days into something similar to Wednesday. Similarly, another function will change last weekdays (like Tuesday, Monday) up to last Wednesday into something similar and unique to the last Wednesday.

Now, what is uniqueness of Wednesday which can be treated logically !! The first thing came to my mind is the day number of the week. MySQL WEEKDAY function gives a number for each day corresponding to the position of the day in a week. Like 2 is for Wednesday. If we use some function to make all other days as 2, we will ed up having all the day in a year as 2.
So, what is the uniqueness of each Wednesday from the other!, across the time! Here MySQL have another function named TO_DAYS. This function gives a unique number for each day since year 0. Now, my plan is to create some operations which for each day gives the same number equals to its previous Wednesday. So from one Wednesday to the next the number remains same and I can apply Group By clause over them to get an weekly aggregation.

The execution
I have an array of weekdays where Monday is 0 & Sunday is 6, same as the WEEKDAY function in MySQL.

$weekArr = array(
			'Monday' => 0,
			'Tuesday' => 1,
			'Wednesday' => 2,
			'Thursday' => 3,
			'Friday' => 4,
			'Saturday' => 5,
			'Sunday' => 6);

So given the weekday, which should be considered as the start day of a week, I can get a number (2 for Wednesday). All the date following it till Sunday has the weekday greater than current Wednesday. So I subtract the day difference between the day and the current Wednesday from the TO_DAYS of the day to make it same as the TO_DAYS of the current Wednesday. I do something like

$startWeekDay = $weekArr['Wednesday']; //2
TO_DAYS(date) - (WEEKDAY(date) - ".$startWeekDay.")

And for all the preceding days up to previous Wednesday, I subtract the day difference between the day and its previous Wednesday. Now the previous Wednesday has exact 7 days different from the current Wednesday. Like

TO_DAYS(date) - (7 - (".$startWeekDay." - WEEKDAY(date)))

And the whole query becomes

SELECT	SUM(cost) AS total, 
		CONCAT(IF(date - INTERVAL 6 day < '".$startDay."', 
				'".$startDay."', 
				IF(WEEKDAY(date - INTERVAL 6 DAY) = ".$startWeekDay.", 
					date - INTERVAL 6 DAY, 
					date - INTERVAL ((WEEKDAY(date) - ".$startWeekDay.")) DAY)),
			' - ', date) AS week, 
			IF((WEEKDAY(date) - ".$startWeekDay.") >= 0, 
				TO_DAYS(date) - (WEEKDAY(date) - ".$startWeekDay."), 
				TO_DAYS(date) - (7 - (".$startWeekDay." - WEEKDAY(date)))) AS sortDay 
FROM	daily_expense
WHERE	date BETWEEN '".$startDay."' AND '".$endDay."' 
GROUP BY sortDay;

which gives

+-------+-------------------------+---------+
| total | week                    | sortDay |
+-------+-------------------------+---------+
|   158 | 2010-08-01 - 2010-08-03 |  734346 |
|   378 | 2010-08-04 - 2010-08-10 |  734353 |
|   365 | 2010-08-11 - 2010-08-17 |  734360 |
|   393 | 2010-08-18 - 2010-08-24 |  734367 |
|   347 | 2010-08-25 - 2010-08-31 |  734374 |
+-------+-------------------------+---------+

Now a date range can be put to limit the results like, BETWEEN ‘2010-08-01’ AND ‘2010-08-31’.

I believe this can be extended or modified to use any interval of aggregation (thats may be the next part), just have to find a logical equation for each step of the interval.

Advertisements

MySQL Group By Modifier – WITH ROLLUP

The Group By WITH ROLLUP modifier in MySQL can give awesome flexibility in data warehouse kind of projects, where users are displayed different hierarchical inter related data and analytical flashy charts.

Always start with simple example : My project saves daily user visit data for my site. And it displays a table, which also can be displayed in flash charts as the data is present, showing page wise daily visit with daily, monthly aggregate of total visit while MySQL does the aggregation part for us. Something like

We have tow tables, ‘urls’ :

CREATE TABLE url (
 id int(11) NOT NULL AUTO_INCREMENT,
 url varchar(255) NOT NULL,
 page varchar(50) NOT NULL,
 PRIMARY KEY (id)
) ENGINE=InnoDB;

INSERT INTO url (id, url, page) VALUES (NULL, 'index.php', 'Home Page'), (NULL, 'contact.php', 'Contact Us');

and a ‘daily_visit’ table :

CREATE TABLE  daily_visit (
id INT NOT NULL AUTO_INCREMENT ,
url_id INT NOT NULL ,
year INT NOT NULL ,
month VARCHAR( 20 ) NOT NULL ,
date VARCHAR( 10 ) NOT NULL ,
visit INT NOT NULL ,
PRIMARY KEY (  id )
) ENGINE = InnoDB ;

INSERT INTO daily_visit (id, url_id, year, month, date, visit) VALUES
(1, 1, 2009, 'August', '2009-08-10', 3),
(2, 1, 2009, 'September', '2009-09-08', 6),
(3, 1, 2009, 'August', '2009-08-15', 9),
(4, 1, 2009, 'September', '2009-09-22', 7),
(5, 2, 2009, 'February', '2009-02-06', 3),
(6, 2, 2009, 'February', '2009-02-24', 4),
(7, 2, 2009, 'April', '2009-04-04', 4),
(8, 2, 2009, 'April', '2009-04-14', 2);

Only one query does all for us. Just add the WITH ROLLUP clause with the Group By clause. Here is the whole script

<?php
$link = mysql_connect('localhost', 'root', '');
mysql_select_db('groupby_modifier', $link);

$sql = "SELECT	page, year, month, date, SUM(visit) as tot_visit
		FROM  daily_visit
			LEFT JOIN url ON url.id = daily_visit.url_id 
		GROUP BY page, year, month, date
		WITH ROLLUP";
$res = mysql_query($sql);
?>
<table width="45%" bgcolor="#CCCCCC">
	<tr>
		<td>
			<table cellspacing="1" cellpadding="1" width="100%">
				<tr>
					<th bgcolor="#FFFFFF">Page</th>
					<th bgcolor="#FFFFFF">Year</th>
					<th bgcolor="#FFFFFF">Month</th>
					<th bgcolor="#FFFFFF">Date</th>
					<th bgcolor="#FFFFFF">Visit</th>
				</tr>
				<?php while($row = mysql_fetch_assoc($res)){?>
					<tr>
						<td bgcolor="#FFFFFF" <?php if(!$row['page']) echo 'colspan="4"';?>><?php echo $row['page'] ? $row['page'] : '<strong style="color:red;">Total</strong>';?></td>
						<?php if($row['page']){?>
							<td bgcolor="#FFFFFF" <?php if(!$row['year']) echo 'colspan="3"';?>>
								<?php echo $row['year'] ? $row['year'] : '<strong style="color:red;">Total</strong>';?>
							</td>
						<?php }?>	
						<?php if($row['year']){?>
							<td bgcolor="#FFFFFF" <?php if(!$row['month']) echo 'colspan="2"';?>>
								<?php echo $row['year'] && !$row['month'] ? '<strong style="color:red;">Total</strong>' : $row['month'];?>
							</td>
						<?php }?>
						<?php if($row['month']){?>
							<td bgcolor="#FFFFFF">
								<?php echo $row['month'] && !$row['date'] ? '<strong style="color:red;">Total</strong>' : $row['date'];?>
							</td>
						<?php }?>
						<td bgcolor="<?php echo !$row['date'] ? '#FFFF8F' : '#FFFFFF';?>" ><?php echo $row['tot_visit'];?></td>
					</tr>
				<?php }?>
			</table>
		</td>
	</tr>
</table>

Now there is lots other things to keep in mind while using the WITH ROLLUP clause. Please refer MySQL Doc for more.

Slow Query in Amazon RDS

Amazon RSD will not give you the SSH access to its database server. That means you don’t have the opportunity to view any of the log files , even the slow query log, for the database. But RDS provides a way to play with the slow queries.

First you have to enable slow query in your ‘parameter-group’ with the slow query time. As you do that, login to your database server through phpMyAdmin or the console and select ‘mysql’ database. You can find a table named ‘slow_log’ there. This table will hold all the queries which are slow depending upon your ‘long_query_time’ parameter.

Now, if your slow queries are many then this table will hold too many queries to handle. You cannot truncate the table or delete any row from there. There is only a way to move all the data from this table to another table named ‘slow_log_backup’, making the slow_log table empty.
To do so you have to run a stored procedure already available in RDS. Select the ‘mysql’ database and you can see some stored procedure listed. Run the procedure named ‘rds_rotate_slow_log’ by running ‘CALL rds_rotate_slow_log’ command.

In the back-end RDS just renames those two tables, it does not actually move the data. Running the command twice will delete the slow log completely from the database server.