Group Data
Group by
SELECT customer_id FROM payment GROUP BY customer_id ORDER BY customer_id;
When using GROUP BY without any aggregator function (SUM, AVG) it behaves similar to DISTINCT since it puts the customer into 1 group.
postgres@localhost:dvdrental> select customer_id from payment group by customer_id order by customer_id limit 10;
+-------------+
| customer_id |
|-------------|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+-------------+
SELECT 10
Time: 0.006s
It's more beneficial to use with an aggregator function, for example SUM()
postgres@localhost:dvdrental> select customer_id, sum(amount) from payment group by customer_id order by sum(amount) desc limit 10;
+-------------+--------+
| customer_id | sum |
|-------------+--------|
| 148 | 211.55 |
| 526 | 208.58 |
| 178 | 194.61 |
| 137 | 191.62 |
| 144 | 189.60 |
| 459 | 183.63 |
| 181 | 167.67 |
| 410 | 167.62 |
| 236 | 166.61 |
| 403 | 162.67 |
+-------------+--------+
SELECT 10
Time: 0.007s
We can also JOIN with other table like
postgres@localhost:dvdrental> select c.first_name || ' ' || c.last_name customer_name, sum(p.amount) total_amount from customer c join payment p using(customer_id) group by customer_id order by sum(p.amount) desc limit 10;
+----------------+--------------+
| customer_name | total_amount |
|----------------+--------------|
| Eleanor Hunt | 211.55 |
| Karl Seal | 208.58 |
| Marion Snyder | 194.61 |
| Rhonda Kennedy | 191.62 |
| Clara Shaw | 189.60 |
| Tommy Collazo | 183.63 |
| Ana Bradley | 167.67 |
| Curtis Irby | 167.62 |
| Marcia Dean | 166.61 |
| Mike Way | 162.67 |
+----------------+--------------+
SELECT 10
Time: 0.009s
postgres@localhost:dvdrental>
Use with multiple columns
If we want select different fields in the query, we might need to group by those field. For example, if we want to see the sum ofcustomer_id and staff_id. We need to add those in the select query as well
postgres@localhost:dvdrental> SELECT customer_id, staff_id, sum(amount) from payment group by customer_id, staff_id limit 10;
+-------------+----------+-------+
| customer_id | staff_id | sum |
|-------------+----------+-------|
| 1 | 1 | 60.85 |
| 1 | 2 | 53.85 |
| 2 | 1 | 55.86 |
| 2 | 2 | 67.88 |
| 3 | 1 | 59.88 |
| 3 | 2 | 70.88 |
| 4 | 1 | 49.88 |
| 4 | 2 | 31.90 |
| 5 | 1 | 63.86 |
| 5 | 2 | 70.79 |
+-------------+----------+-------+
SELECT 10
Time: 0.007s
in this case, it will group by pair of (customer_id, staff_id)
Group by date
postgres@localhost:dvdrental> select payment_date::date payment_date, sum(amount) sum from payment group by payment_date::date order by payment_date desc lim
it 10;
+--------------+---------+
| payment_date | sum |
|--------------+---------|
| 2007-05-14 | 514.18 |
| 2007-04-30 | 5723.89 |
| 2007-04-29 | 2717.60 |
| 2007-04-28 | 2622.73 |
| 2007-04-27 | 2673.57 |
| 2007-04-26 | 347.21 |
| 2007-04-12 | 1930.48 |
| 2007-04-11 | 1940.32 |
| 2007-04-10 | 1973.18 |
| 2007-04-09 | 2067.86 |
+--------------+---------+
SELECT 10
Time: 0.013s
Having
Having goes with GROUP BY to support conditions
SELECT column1, aggregate_function (column2) FROM table_name GROUP BY column1 HAVING condition;
Difference to WHERE as this only affect to GROUP BY
postgres@localhost:dvdrental> select customer_id, sum(amount) amount from payment group by customer_id having sum(amount) >= 200 order by amount desc limit 1
0;
+-------------+--------+
| customer_id | amount |
|-------------+--------|
| 148 | 211.55 |
| 526 | 208.58 |
+-------------+--------+
SELECT 2
Time: 0.011s
Grouping sets
For example if we have this
postgres@localhost:dvdrental> select * from sales;
+-------+---------+----------+
| brand | segment | quantity |
|-------+---------+----------|
| ABC | Premium | 100 |
| ABC | Basic | 200 |
| XYZ | Premium | 100 |
| XYZ | Basic | 300 |
+-------+---------+----------+
SELECT 4
Time: 0.006s
and you want to see the total by
- Brand
- Brand and segment
- Segment
- All
Then groupping sets will do this for you
postgres@localhost:dvdrental> select brand, segment, sum(quantity) from sales group by grouping sets ((brand, segment), (brand), (segment), ());
+--------+---------+-----+
| brand | segment | sum |
|--------+---------+-----|
| <null> | <null> | 700 |
| XYZ | Basic | 300 |
| ABC | Premium | 100 |
| ABC | Basic | 200 |
| XYZ | Premium | 100 |
| ABC | <null> | 300 |
| XYZ | <null> | 400 |
| <null> | Basic | 500 |
| <null> | Premium | 200 |
+--------+---------+-----+
SELECT 9
Time: 0.007s
In here the null null is total. Where as Brand, null (segment) refer to the brand And null (brand), segment refer to the segment.
Grouping
The grouping(item) function indicate if the item was selected by groupping sets or not. The item must be one that declared in grouping sets
postgres@localhost:dvdrental> select grouping(brand) is_brand_selected, grouping(segment) is_segment_selected, brand, segment, sum(quantity) from sales group by grouping sets ((brand), (segment), (brand, segment), ());
+-------------------+---------------------+--------+---------+-----+
| is_brand_selected | is_segment_selected | brand | segment | sum |
|-------------------+---------------------+--------+---------+-----|
| 1 | 1 | <null> | <null> | 700 |
| 0 | 0 | XYZ | Basic | 300 |
| 0 | 0 | ABC | Premium | 100 |
| 0 | 0 | ABC | Basic | 200 |
| 0 | 0 | XYZ | Premium | 100 |
| 0 | 1 | ABC | <null> | 300 |
| 0 | 1 | XYZ | <null> | 400 |
| 1 | 0 | <null> | Basic | 500 |
| 1 | 0 | <null> | Premium | 200 |
+-------------------+---------------------+--------+---------+-----+
SELECT 9
Time: 0.010s
1is not selected0is selected
CUBE
Cube generate all the possible grouping set for you. For example:
CUBE(c1,c2,c3)
Is the same as
GROUPING SETS ( (c1,c2,c3), (c1,c2), (c1,c3), (c2,c3), (c1), (c2), (c3), () )
postgres@localhost:dvdrental> select brand, segment, sum(quantity) from sales group by cube(brand, segment) order by brand, segment;
+--------+---------+-----+
| brand | segment | sum |
|--------+---------+-----|
| ABC | Basic | 200 |
| ABC | Premium | 100 |
| ABC | <null> | 300 |
| XYZ | Basic | 300 |
| XYZ | Premium | 100 |
| XYZ | <null> | 400 |
| <null> | Basic | 500 |
| <null> | Premium | 200 |
| <null> | <null> | 700 |
+--------+---------+-----+
SELECT 9
Time: 0.018s
Rollup
Similar to CUBE but only generate 4 individual sets. for example. Assuming c1 > c2 > c3.
rollup(c1, c2, c3)
(c1, c2, c3)
(c1, c2)
(c1)
()
postgres@localhost:dvdrental> select brand, segment, sum(quantity) from sales group by rollup(segment, brand) order by brand, segment;
+--------+---------+-----+
| brand | segment | sum |
|--------+---------+-----|
| ABC | Basic | 200 |
| ABC | Premium | 100 |
| XYZ | Basic | 300 |
| XYZ | Premium | 100 |
| <null> | Basic | 500 |
| <null> | Premium | 200 |
| <null> | <null> | 700 |
+--------+---------+-----+
SELECT 7
Time: 0.007s
So because we put rollup(segment, brand). This will generate:
- Everything
- Just the segment
- Segment and brand