Postgres Concurrency : What could go wrong (part 3)

Costa Rica and Ethiopia

After a while in the company, playing with the database and writing business reports, a day, you encounter an issue. In the beginning, you were checking the total number of prices of Ethiopian producers.

Ethiopia is considered the cradle of coffee. Clients particularly appreciate Ethiopian coffee.

Did you know Ethiopia is responsible for almost 5% of worldwide production?

Then knowing the number of producers is essential and meaningful.

SELECT COUNT(*) FROM coffee_producer 
WHERE country = 'Ethiopia';

After this, you are asked to write a transaction. In this transaction, you calculate the number of producers in Ethiopia, among other things. But you also update the unit price of coffees from a producer in Costa Rica.

START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- do stuff
SELECT COUNT(*) FROM coffee_productor 
WHERE country = 'Ethiopia';
-- do other stuff
UPDATE coffee_productor SET unit_price = '40' 
WHERE country = 'Costa Rica';
END;

Bob's transaction

At the same time, Bob has been in charge of roughly the same as you, knowing the number of coffee producers in Costa Rica and inserting a new producer among the Ethiopian ones.

Costa Rican Coffee weighs less than Ethiopian coffee, only 1% of worldwide coffee production.

At the beginning of 2000, a lot was done to improve the quality of Costa Rican coffee, reducing the impact on nature and giving more ownership to the coffee producer in Costa Rica.

Bob, Your colleague, is aware of isolation levels and issues since your precedent adventure with him. He wrote a transaction with an isolation level of repeatable read.

START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT SUM(unit_price) FROM coffees 
WHERE country = 'Costa Rica';
INSERT INTO coffee_producers 
VALUES ('Tom Bama,''Ethiopia')
END;

At this moment, you did not know that your colleague was writing this transaction, and you started your transaction.

A weird result

When you launch, your transaction result differs from the first query about the total number of Ethiopian producers. Something is strange here; you knew that when you launched your transaction, nothing was happening on the database.

How could Bob manage to launch his transaction?

Moreover, what would have happened if your transaction had finished before Bob's one?

Indeed if your transaction had finished before Bob's, it would have led to another situation.

Here you queried :

  • the total number of Ethiopian coffee producers then

  • updated another subset of the data: the unit_price of Costa Rican coffee.

Bob did the opposite of you :

  • he queried the number of the producer of Costa Rican coffee then

  • inserted another subset of the data: a new Ethiopian coffee producer.

Serialization issue

This issue is known as a serialization issue. In this case, you have to launch a transaction and access a subset of the data and put a lock on them.

Then another transaction takes another subset of data with a lock but in the same table. And when you release the safety, it can access your subset of data.

But then you access the data that was locked by another transaction. Depending on the order, you can have one result or another.

Is this different from Read Repeatable ?

Read Repeatable cannot help us because there are no two reads on the same subset of data and the same filtering condition. But we see that these transactions are linked together.

And you are likely to want that if one begins first, it will be consistent until the end.

Solving the Serialization issue

In Postgres, there is another isolation level to prevent this. This is SERIALIZABLE

START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- do stuff
SELECT COUNT(*) FROM coffee_productor 
WHERE country = 'Ethiopia';
-- do other stuff
UPDATE coffee_productor SET unit_price = '40' 
WHERE country = 'Costa Rica';
END;

Now Postgres will monitor to ensure no entangled transaction will be executed simultaneously. If two transactions manipulate the same data, this will lead to the failure of the transaction that commits last.

The end

This article has been challenging to write. Serialization issues are peculiar, and finding a simple yet meaningful example has been difficult. Nevertheless, I hope this will be useful for you to understand Isolation issues deeply.

It was an entertaining series to write mainly because I introduced storytelling. I hope you enjoyed it.

We should remember that concurrency can be challenging to handle, but adding more constraint and checks also mean a drop in performance.

This could mean that you should constantly monitor your transaction and ensure that there is a retry mechanism for them.
That said, databases are a strategic piece of the software architecture, and monitoring them is necessary.

The Isolation principle of acId has no secrets for you and your colleague Bob, and you can now face a large panel of concurrency issues.

Keep in Touch

On Twitter : [@yet_anotherDev] (https://twitter.com/yet_anotherDev)