Transaction Behavior in PostgreSQL

PostgreSQL provides a rich set of tools for developers to manage concurrent access to data. Internally, data consistency is maintained by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that each SQL statement sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session. MVCC, by eschewing the locking methodologies of traditional database systems, minimizes lock contention in order to allow for reasonable performance in multiuser environments.

The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading. PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation through the use of an innovative Serializable Snapshot Isolation (SSI) level.

Table- and row-level locking facilities are also available in PostgreSQL for applications which don’t generally need full transaction isolation and prefer to explicitly manage particular points of conflict. However, proper use of MVCC will generally provide better performance than locks. In addition, application-defined advisory locks provide a mechanism for acquiring locks that are not tied to a single transaction.

The SQL standard defines four levels of transaction isolation. The most strict is Serializable, which is defined by the standard in a paragraph which says that any concurrent execution of a set of Serializable transactions is guaranteed to produce the same effect as running them one at a time in some order. The other three levels are defined in terms of phenomena, resulting from interaction between concurrent transactions, which must not occur at each level. The standard notes that due to the definition of Serializable, none of these phenomena are possible at that level. 

The phenomena which are prohibited at various levels are:

dirty read

A transaction reads data written by a concurrent uncommitted transaction.

nonrepeatable read

A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).

phantom read

A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.

serialization anomaly

The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.

Isolation Level Dirt ReadNonrepeatable ReadPhantom ReadSerialization Anomaly
Read CommittedNot Possible Possible Possible Possible
Repeatable ReadNot Possible Not Possible Not Possible Possible
SerializableNot Possible Not Possible Not Possible Not Possible
table 1. Transaction Isolation level in PostgreSQL

Because dirty read does not exist in PostgreSQL, the behaviour of read uncommited and read commited are the same. We are gonna start giving an example for nonrepeatable read, and phantom read.

Nonrepeatable read

When a transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the first read), this is called a nonrepeatable read. Here are some example of nonrepeatable read case:

User 1

db_test=# begin;
BEGIN
db_test=# show transaction_isolation ;

 transaction_isolation
-----------------------
 read committed

(1 row)
db_test=# select * from test2;

 id | a
----+---
  1 |
  2 |
  3 |
  4 |
  5 |
  6 |
  7 |
  8 |
  9 |
 10 |

(10 rows)
db_test=# select * from test2;

 id |   a
----+--------
  2 |
  3 |
  4 |
  5 |
  6 |
  7 |
  8 |
  9 |
 10 |
  1 | Suzuki

(10 rows)
imam_test=# commit;
COMMIT

User 2

db_test=# update test2 set a = 'suzuki' where id = 1;
UPDATE 1

What exactly happen there is user 1 start a transaction, while user 1 still in transaction user 2 is changing a row (update) on test2 table (instantly commit), so when user 1 is run select star from test2 he can see the row are changing, this phenomena happen when user 1 still in transaction and has not commit yet, and it is what we called nonrepeatable read.

The transaction isolation level above is read committed, and that is why nonrepeatable read occur. To solve this we can change the isolation level up to repeatable read or serializable (see table 1).

Phantom read

A phantom read is when a transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. Let’s see this case in action.

User 1:

db_test=# BEGIN;
BEGIN
db_test=# show transaction_isolation ;
 transaction_isolation
-----------------------
 read committed
(1 row)

imam_test=# select count(*) from test2;
 count
-------
    10
(1 row)
imam_test=# select count(*) from test2;
 count
-------
    11
(1 row)

imam_test=# select * from test2;
 id |   a
----+--------
  2 |
  3 |
  4 |
  5 |
  6 |
  7 |
  8 |
  9 |
 10 |
  1 | suzuki
 11 | Trumph

(11 rows)

User 2:

imam_test=# insert into test2 (id, a) values (11, 'Trumph');
INSERT 0 1

From above we can see that user 1 try to count rows on test2 and get the result for 10 rows, in the other hand user 2 insert new data to test2, and soon after that user 1 count the row again and get the different result, it become 11 rows, this case is called phantom read. The isolation level we used above are default PostgreSQL, which is read commited, if we do not want phantom read happen we can change the transaction isolation level to serializable, to choose serializable level we can use this command on the beginning of a transaction:

imam_test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
BEGIN

Serialization Anomaly

Finally, a serialization anomaly can occur when the order in which you run transactions changes the resulting value stored in the database. In this example, we have two transactions, which would result in a different result being written in the database depending on the order in which the transactions completed.

User 1

imam_test=# BEGIN ;
BEGIN

imam_test=# select * from test2;

 id |   a
----+--------
  2 |
  3 |
  4 |
  5 |
  6 |
  7 |
  8 |
  9 |
 10 |
  1 | suzuki
 11 | Trumph
(11 rows)

imam_test=# update test2 SET a = 'yamaha' where id = 2;
UPDATE 1

Here user 1 commit the transaction

imam_test=# commit;
COMMIT

we check the test2 table on user 1 (user 2 have not committed transaction)

imam_test=# select * from test2;

 id |   a
----+--------
  3 |
  4 |
  5 |
  6 |
  7 |
  8 |
  9 |
 10 |
  1 | suzuki
 11 | Trumph
  2 | yamaha
(11 rows)

Run select * from test2

imam_test=# select * from test2;
 id |   a
----+--------
  3 |
  4 |
  5 |
  6 |
  7 |
  8 |
  9 |
 10 |
  1 | suzuki
 11 | Trumph
  2 | honda
(11 rows)

User 2

imam_test=# BEGIN ;
BEGIN

imam_test=# update test2 SET a = 'honda' where id = 2;

At this point, user 2 try to update the same row as user 1, but because user 1 has not committed the transaction yet user 2 will wait until transaction in user 1 is finish, and the prompt UPDATE 1 will appear in user 2 console.

because user 1 already committed the transaction, now update query on user 2 can be executed

imam_test=# update test2 SET a = 'honda' where id = 2;
UPDATE 1

User 2 commit the transaction

imam_test=# commit;
COMMIT

From example above, the value for id = 2 became ‘honda’, it change since user 2 committed the transaction, it override ‘yamaha’ before, without warning message, case like this we called serialization anomaly. This phenomena can lead to data inconsistency. To avoid this situation we can set isolation level to serializable

If we set the isolation level to serializable, soon after update on user 2 execute this message will appear

ERROR:  could not serialize access due to concurrent update

The message tell us there is a transaction accessing the same row, preventing serialization anomaly to occur. Then user 2 will know that other user still accessing that row, so user 2 have to repeat the whole transaction from the beginning.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>