Pivot Table in PostgreSQL

ยท

4 min read

Introduction

Today we are going to see a really important concept when working with data and how you represent data. We are going to talk about Pivot Table indeed representing data in columns instead of rows is often more readable.

Example

Let's say you run a coffee shop and you want to know what is the sales of each coffee over 2 years. We have an OLAP database with a (very simple) star schema with the fact table coffee_sales and 2 dimensions a coffees dimension and a time (year here) dimension.

CREATE TABLE coffee_dim (
    coffee_id INTEGER PRIMARY KEY,
    name text,
        unit_price NUMERIC
);

CREATE TABLE date_dim (
    date_id INTEGER PRIMARY KEY,
    year integer
);

CREATE TABLE coffees_sales (
    sales_id INTEGER PRIMARY KEY,
    coffee_id INTEGER REFERENCES coffee_dim(coffee_id),
        date_id INTEGER REFERENCES date_dim(date_id),
    quantity INTEGER
)

If we visualize it in a ERD tool we can see the the fact table coffee_sales and the 2 dimensions like this :

Image description

Generate a report over the years

As we said we want to generate a report about the sales of our coffees over the year. Something like the following data:

Image description

It is not really readable, isn't it? These data are generated by a query that is more or less looking like this :

SELECT c.name name,d.year as year ,sum(cs.quantity)::INTEGER
    FROM coffees_sales cs
    JOIN coffees c
    USING(coffee_id)
    JOIN date_dim d
    USING(date_id)
    GROUP BY name, year
    ORDER BY 1,2

Pivoting our data over the year would be much more readable here to do that we have several solutions. We will dive into two of them one which works with plain sql and another one that needs Postgres extensions. The precedent SQL query will be the base of the solutions we will see.

Pivoting with CTE

One of the approaches we could use is CTE. Indeed if we define a CTE for each amount for each year we can then query the different amounts and name and name the amount column with the year where comes these data. Let's see what it looks like in a concrete example.

WITH cte_2020 as (SELECT c.name name,d.year as year ,sum(cs.quantity)::INTEGER
    FROM coffees_sales cs
    JOIN coffee_dim c
    USING(coffee_id)
    JOIN date_dim d
    ON d.date_id = cs.date_id
        AND YEAR=2020
    GROUP BY name, year
    ORDER BY 1,2),
cte_2021 as (SELECT c.name name,d.year as year ,sum(cs.quantity)::INTEGER
    FROM coffees_sales cs
    JOIN coffee_dim c
    USING(coffee_id)
    JOIN date_dim d
    ON d.date_id = cs.date_id
        AND YEAR=2021
    GROUP BY name, year
    ORDER BY 1,2)
select c1.name, c1.sum as "2020",  c2.sum as "2021"
from cte_2020 c1 join cte_2021 c2
on c1.name = c2.name

You are probably thinking that it is a lot of effort to not much moreover what if you want 2022 also? Here comes Crosstab!

Pivoting with Crosstab extension

You can create a pivot table in Postgres with the crosstab extension.

First, you will need to import the tablefunc extension of Postgres. And then you just have to use the CROSSTAB function like this :

CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT * FROM CROSSTAB(/**/) AS ct (name text, "2020" INTEGER,
 "2021" INTEGER)

The query for the name, year, and number of sales will lead to the following final query :

CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT * FROM CROSSTAB($$
    SELECT c.name name,d.year as year ,sum(cs.quantity)::INTEGER
    FROM coffees_sales cs
    JOIN coffee_dim c
    USING(coffee_id)
    JOIN date_dim d
    USING(date_id)
    GROUP BY name, year
    ORDER BY 1,2
$$)
AS ct (name text, "2020" INTEGER, "2021" INTEGER)

This query will generate this data:

Image description

It is a bit clearer and readable don't you think?

So with CROSSTAB, it is a lot clearer and it is easier to respect DRY principles. Nevertheless, it took me some time to understand how to use CROSSTAB and pivot my data.

Whereas it is much more straightforward thanks to CTE you just have to query your data filter it and give the column the name you want.

Conclusion

Pivoting is a crucial technique to know in data analysis. First, we have seen how to pivot our data with CTEs, this method can be applied to any database but is not scalable if we have a lot of different values in the column we want to pivot.

Then we have seen how to pivot our data thanks to the CROSSTAB function presents in the Postgres tablefunc EXTENSION. This method is more scalable, and it will be easier to respect the DRY principles.

Nevertheless, the learning curve is a bit more complicated with CROSSTAB. Maybe if you do not pivot data often or you do not have a lot of values in the column you want to pivot it is not worth it. ๐Ÿ˜Ž

##Keep in Touch On Twitter : @yet_anotherdev

ย