Group Data

Group by

SELECT customer_id FROM payment GROUP BY customer_id ORDER BY customer_id;

When using GROUP BY without any aggregator function (SUM, AVG) it behaves similar to DISTINCT since it puts the customer into 1 group.

postgres@localhost:dvdrental> select customer_id from payment group by customer_id order by customer_id limit 10;
+-------------+
| customer_id |
|-------------|
| 1           |
| 2           |
| 3           |
| 4           |
| 5           |
| 6           |
| 7           |
| 8           |
| 9           |
| 10          |
+-------------+
SELECT 10
Time: 0.006s

It's more beneficial to use with an aggregator function, for example SUM()

postgres@localhost:dvdrental> select customer_id, sum(amount) from payment group by customer_id order by sum(amount) desc limit 10;
+-------------+--------+
| customer_id | sum    |
|-------------+--------|
| 148         | 211.55 |
| 526         | 208.58 |
| 178         | 194.61 |
| 137         | 191.62 |
| 144         | 189.60 |
| 459         | 183.63 |
| 181         | 167.67 |
| 410         | 167.62 |
| 236         | 166.61 |
| 403         | 162.67 |
+-------------+--------+
SELECT 10
Time: 0.007s

We can also JOIN with other table like

postgres@localhost:dvdrental> select c.first_name || ' ' || c.last_name customer_name, sum(p.amount) total_amount from customer c join payment p using(customer_id) group by customer_id order by sum(p.amount) desc limit 10;
+----------------+--------------+
| customer_name  | total_amount |
|----------------+--------------|
| Eleanor Hunt   | 211.55       |
| Karl Seal      | 208.58       |
| Marion Snyder  | 194.61       |
| Rhonda Kennedy | 191.62       |
| Clara Shaw     | 189.60       |
| Tommy Collazo  | 183.63       |
| Ana Bradley    | 167.67       |
| Curtis Irby    | 167.62       |
| Marcia Dean    | 166.61       |
| Mike Way       | 162.67       |
+----------------+--------------+
SELECT 10
Time: 0.009s
postgres@localhost:dvdrental>

Use with multiple columns

If we want select different fields in the query, we might need to group by those field. For example, if we want to see the sum ofcustomer_id and staff_id. We need to add those in the select query as well

postgres@localhost:dvdrental> SELECT customer_id, staff_id, sum(amount) from payment group by customer_id, staff_id limit 10;
+-------------+----------+-------+
| customer_id | staff_id | sum   |
|-------------+----------+-------|
| 1           | 1        | 60.85 |
| 1           | 2        | 53.85 |
| 2           | 1        | 55.86 |
| 2           | 2        | 67.88 |
| 3           | 1        | 59.88 |
| 3           | 2        | 70.88 |
| 4           | 1        | 49.88 |
| 4           | 2        | 31.90 |
| 5           | 1        | 63.86 |
| 5           | 2        | 70.79 |
+-------------+----------+-------+
SELECT 10
Time: 0.007s

in this case, it will group by pair of (customer_id, staff_id)

Group by date

postgres@localhost:dvdrental> select payment_date::date payment_date, sum(amount) sum from payment group by payment_date::date order by payment_date desc lim
 it 10;
+--------------+---------+
| payment_date | sum     |
|--------------+---------|
| 2007-05-14   | 514.18  |
| 2007-04-30   | 5723.89 |
| 2007-04-29   | 2717.60 |
| 2007-04-28   | 2622.73 |
| 2007-04-27   | 2673.57 |
| 2007-04-26   | 347.21  |
| 2007-04-12   | 1930.48 |
| 2007-04-11   | 1940.32 |
| 2007-04-10   | 1973.18 |
| 2007-04-09   | 2067.86 |
+--------------+---------+
SELECT 10
Time: 0.013s

Having

Having goes with GROUP BY to support conditions

SELECT column1, aggregate_function (column2) FROM table_name GROUP BY column1 HAVING condition;

Difference to WHERE as this only affect to GROUP BY

postgres@localhost:dvdrental> select customer_id, sum(amount) amount from payment group by customer_id having sum(amount) >= 200 order by amount desc limit 1
 0;
+-------------+--------+
| customer_id | amount |
|-------------+--------|
| 148         | 211.55 |
| 526         | 208.58 |
+-------------+--------+
SELECT 2
Time: 0.011s

Grouping sets

For example if we have this

postgres@localhost:dvdrental> select * from sales;
+-------+---------+----------+
| brand | segment | quantity |
|-------+---------+----------|
| ABC   | Premium | 100      |
| ABC   | Basic   | 200      |
| XYZ   | Premium | 100      |
| XYZ   | Basic   | 300      |
+-------+---------+----------+
SELECT 4
Time: 0.006s

and you want to see the total by

  1. Brand
  2. Brand and segment
  3. Segment
  4. All

Then groupping sets will do this for you

postgres@localhost:dvdrental> select brand, segment, sum(quantity) from sales group by grouping sets ((brand, segment), (brand), (segment), ());
+--------+---------+-----+
| brand  | segment | sum |
|--------+---------+-----|
| <null> | <null>  | 700 |
| XYZ    | Basic   | 300 |
| ABC    | Premium | 100 |
| ABC    | Basic   | 200 |
| XYZ    | Premium | 100 |
| ABC    | <null>  | 300 |
| XYZ    | <null>  | 400 |
| <null> | Basic   | 500 |
| <null> | Premium | 200 |
+--------+---------+-----+
SELECT 9
Time: 0.007s

In here the null null is total. Where as Brand, null (segment) refer to the brand And null (brand), segment refer to the segment.

Grouping

The grouping(item) function indicate if the item was selected by groupping sets or not. The item must be one that declared in grouping sets

postgres@localhost:dvdrental> select grouping(brand) is_brand_selected, grouping(segment) is_segment_selected, brand, segment, sum(quantity) from sales group by grouping sets ((brand), (segment), (brand, segment), ());
+-------------------+---------------------+--------+---------+-----+
| is_brand_selected | is_segment_selected | brand  | segment | sum |
|-------------------+---------------------+--------+---------+-----|
| 1                 | 1                   | <null> | <null>  | 700 |
| 0                 | 0                   | XYZ    | Basic   | 300 |
| 0                 | 0                   | ABC    | Premium | 100 |
| 0                 | 0                   | ABC    | Basic   | 200 |
| 0                 | 0                   | XYZ    | Premium | 100 |
| 0                 | 1                   | ABC    | <null>  | 300 |
| 0                 | 1                   | XYZ    | <null>  | 400 |
| 1                 | 0                   | <null> | Basic   | 500 |
| 1                 | 0                   | <null> | Premium | 200 |
+-------------------+---------------------+--------+---------+-----+
SELECT 9
Time: 0.010s

  • 1 is not selected
  • 0 is selected

CUBE

Cube generate all the possible grouping set for you. For example:

CUBE(c1,c2,c3)

Is the same as

GROUPING SETS ( (c1,c2,c3), (c1,c2), (c1,c3), (c2,c3), (c1), (c2), (c3), () )
postgres@localhost:dvdrental> select brand, segment, sum(quantity) from sales group by cube(brand, segment) order by brand, segment;
+--------+---------+-----+
| brand  | segment | sum |
|--------+---------+-----|
| ABC    | Basic   | 200 |
| ABC    | Premium | 100 |
| ABC    | <null>  | 300 |
| XYZ    | Basic   | 300 |
| XYZ    | Premium | 100 |
| XYZ    | <null>  | 400 |
| <null> | Basic   | 500 |
| <null> | Premium | 200 |
| <null> | <null>  | 700 |
+--------+---------+-----+
SELECT 9
Time: 0.018s

Rollup

Similar to CUBE but only generate 4 individual sets. for example. Assuming c1 > c2 > c3.

rollup(c1, c2, c3)
(c1, c2, c3)
(c1, c2)
(c1)
()
postgres@localhost:dvdrental> select brand, segment, sum(quantity) from sales group by rollup(segment, brand) order by brand, segment;
+--------+---------+-----+
| brand  | segment | sum |
|--------+---------+-----|
| ABC    | Basic   | 200 |
| ABC    | Premium | 100 |
| XYZ    | Basic   | 300 |
| XYZ    | Premium | 100 |
| <null> | Basic   | 500 |
| <null> | Premium | 200 |
| <null> | <null>  | 700 |
+--------+---------+-----+
SELECT 7
Time: 0.007s

So because we put rollup(segment, brand). This will generate:

  1. Everything
  2. Just the segment
  3. Segment and brand