Pages

Subscribe:

Ads 468x60px

Labels

2014年3月11日 星期二

MySQL/Pivot table

"pivot table" or a "crosstab report"
(Note: this page needs to be wikified)
SQL Characteristic Functions: Do it without "if", "case", or "GROUP_CONCAT". Yes, there is use for this..."if" statements sometimes cause problems when used in combination.
The simple secret, and it's also why they work in almost all databases, is the following functions:
  • sign (x) returns -1,0, +1 for values x < 0, x = 0, x > 0 respectively
  • abs( sign( x) ) returns 0 if x = 0 else, 1 if x > 0 or x < 0
  • 1-abs( sign( x) ) complement of the above, since this returns 1 only if x = 0
   Quick example:   sign(-1) = -1,  abs( sign(-1) ) = 1,  1-abs( sign(-1) ) = 0

Data for full example:
      CREATE TABLE exams (
        pkey int(11) NOT NULL auto_increment,
        name varchar(15),
        exam int,
        score int,
        PRIMARY KEY  (pkey)
      );

      insert into exams (name,exam,score) values ('Bob',1,75);
      insert into exams (name,exam,score) values ('Bob',2,77);
      insert into exams (name,exam,score) values ('Bob',3,78);
      insert into exams (name,exam,score) values ('Bob',4,80);

      insert into exams (name,exam,score) values ('Sue',1,90);
      insert into exams (name,exam,score) values ('Sue',2,97);
      insert into exams (name,exam,score) values ('Sue',3,98);
      insert into exams (name,exam,score) values ('Sue',4,99);

mysql> select * from exams;
+------+------+------+-------+
| pkey | name | exam | score |
+------+------+------+-------+
|    1 | Bob  |    1 |    75 |
|    2 | Bob  |    2 |    77 |
|    3 | Bob  |    3 |    78 |
|    4 | Bob  |    4 |    80 |
|    5 | Sue  |    1 |    90 |
|    6 | Sue  |    2 |    97 |
|    7 | Sue  |    3 |    98 |
|    8 | Sue  |    4 |    99 |
+------+------+------+-------+
8 rows in set (0.00 sec)

mysql> select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4
from exams group by name;

+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob  |    75 |    77 |    78 |    80 |
| Sue  |    90 |    97 |    98 |    99 |
+------+-------+-------+-------+-------+
2 rows in set (0.00 sec)

Note, the above pivot table was created with one select statement.
Let's decompose to make the trick clearer, for the second exam:
mysql> select name, score, exam, exam-2, sign(exam-2), abs(sign(exam-2)), 1-abs(sign(exam-2)),
       score*(1-abs(sign(exam-2))) as exam2 from exams;
+------+-------+------+--------+--------------+-------------------+---------------------+-------+
| name | score | exam | exam-2 | sign(exam-2) | abs(sign(exam-2)) | 1-abs(sign(exam-2)) | exam2 |
+------+-------+------+--------+--------------+-------------------+---------------------+-------+
| Bob  |    75 |    1 |     -1 |           -1 |                 1 |                   0 |     0 |
| Bob  |    77 |    2 |      0 |            0 |                 0 |                   1 |    77 |
| Bob  |    78 |    3 |      1 |            1 |                 1 |                   0 |     0 |
| Bob  |    80 |    4 |      2 |            1 |                 1 |                   0 |     0 |
| Sue  |    90 |    1 |     -1 |           -1 |                 1 |                   0 |     0 |
| Sue  |    97 |    2 |      0 |            0 |                 0 |                   1 |    97 |
| Sue  |    98 |    3 |      1 |            1 |                 1 |                   0 |     0 |
| Sue  |    99 |    4 |      2 |            1 |                 1 |                   0 |     0 |
+------+-------+------+--------+--------------+-------------------+---------------------+-------+
8 rows in set (0.00 sec)

You may think IF's would be clean but WATCH OUT! Look what the following gives (INCORRECT !!):
mysql> select name,
if(exam=1,score,null) as exam1,
if(exam=2,score,null) as exam2,
if(exam=3,score,null) as exam3,
if(exam=4,score,null) as exam4
from exams group by name;

+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob  |    75 |  NULL |  NULL |  NULL |
| Sue  |    90 |  NULL |  NULL |  NULL |
+------+-------+-------+-------+-------+
2 rows in set (0.00 sec)

Note: the following does work - is all the maths necessary after all?
mysql> SELECT name,
       SUM(IF(exam=1,score,NULL)) AS exam1,
       SUM(IF(exam=2,score,NULL)) AS exam2,
       SUM(IF(exam=3,score,NULL)) AS exam3,
       SUM(IF(exam=4,score,0)) AS exam4
       FROM exams GROUP BY name;
+------+-------+-------+-------+-------+
| name | exam1 | exam2 | exam3 | exam4 |
+------+-------+-------+-------+-------+
| Bob  |    75 |    77 |    78 |    80 |
| Sue  |    90 |    97 |    98 |    99 |
+------+-------+-------+-------+-------+
2 rows in set (0.00 sec)


mysql> select name,
       sum(score*(1-abs(sign(exam-1)))) as exam1,
       sum(score*(1-abs(sign(exam-2)))) as exam2,
       sum(score*(1-abs(sign(exam-3)))) as exam3,
       sum(score*(1-abs(sign(exam-4)))) as exam4,
         sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam- 1)))) as delta_1_2,
         sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam- 2)))) as delta_2_3,
         sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam- 3)))) as delta_3_4
       from exams group by name;
+------+-------+-------+-------+-------+-----------+-----------+-----------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 | delta_3_4 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+
| Bob  |    75 |    77 |    78 |    80 |         2 |         1 |         2 |
| Sue  |    90 |    97 |    98 |    99 |         7 |         1 |         1 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+
2 rows in set (0.00 sec)

Above delta_1_2 shows the difference between the first and second exams, with the numbers being positive because both Bob and Sue improved their score with each exam. Calculating the deltas here shows it's possible to compare two rows, not columns which is easily done with the standard SQL statements but rows in the original table.
mysql>select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
  sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam- 1)))) as delta_1_2,
  sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam- 2)))) as delta_2_3,
  sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam- 3)))) as delta_3_4,

  sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam- 1))))  +
  sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam- 2))))  +
  sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam- 3))))  as TotalIncPoints
  from exams group by name;

+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 | delta_3_4 | TotalIncPoints |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
| Bob  |    75 |    77 |    78 |    80 |         2 |         1 |         2 |              5 |
| Sue  |    90 |    97 |    98 |    99 |         7 |         1 |         1 |              9 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+
2 rows in set (0.00 sec)

TotalIncPoints shows the sum of the deltas.
select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,
  sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam- 1)))) as delta_1_2,
  sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam- 2)))) as delta_2_3,
  sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam- 3)))) as delta_3_4,


  sum(score*(1-abs(sign(exam- 2)))) -   sum(score*(1-abs(sign(exam- 1))))  +
  sum(score*(1-abs(sign(exam- 3)))) -   sum(score*(1-abs(sign(exam- 2))))  +
  sum(score*(1-abs(sign(exam- 4)))) -   sum(score*(1-abs(sign(exam- 3))))  as TotalIncPoints,

(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/4 as AVG

from exams group by name;

+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
| name | exam1 | exam2 | exam3 | exam4 | delta_1_2 | delta_2_3 | delta_3_4 | TotalIncPoints | AVG   |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
| Bob  |    75 |    77 |    78 |    80 |         2 |         1 |         2 |              5 | 77.50 |
| Sue  |    90 |    97 |    98 |    99 |         7 |         1 |         1 |              9 | 96.00 |
+------+-------+-------+-------+-------+-----------+-----------+-----------+----------------+-------+
2 rows in set (0.00 sec)

It's possible to combine Total Increasing Point TotalIncPoints with AVG. In fact, it's possible to combine all of the example cuts of the data into one SQL statement, which provides additional options for displaying data on your page
select name,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4,

(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))))/2  as AVG1_2,

(sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))))/2 as AVG2_3,

(sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/2 as AVG3_4,

(sum(score*(1-abs(sign(exam-1)))) +
sum(score*(1-abs(sign(exam-2)))) +
sum(score*(1-abs(sign(exam-3)))) +
sum(score*(1-abs(sign(exam-4)))))/4 as AVG

from exams group by name;

+------+-------+-------+-------+-------+--------+--------+--------+-------+
| name | exam1 | exam2 | exam3 | exam4 | AVG1_2 | AVG2_3 | AVG3_4 | AVG   |
+------+-------+-------+-------+-------+--------+--------+--------+-------+
| Bob  |    75 |    77 |    78 |    80 |  76.00 |  77.50 |  79.00 | 77.50 |
| Sue  |    90 |    97 |    98 |    99 |  93.50 |  97.50 |  98.50 | 96.00 |
+------+-------+-------+-------+-------+--------+--------+--------+-------+
2 rows in set (0.00 sec)

Exam scores are listing along with moving averages...again it's all with one select statement.
Good article on "Cross tabulations" or de-normalizing data to show stats: http://dev.mysql.com/tech-resources/articles/wizard/print_version.html
ADOdb (PHP) can generate pivot tables using PivotTableSQL().
For Perl, check DBIx-SQLCrosstab.

沒有留言:

張貼留言