Databases
1. Concepts
1.1 Isolation Levels
During parallel execution of transactions, a variety of race conditions can occur. Transaction isolation levels are methods used to prevent these. Each level is characterized by the type of race condition it prevents. Moreover, each level also prevents all the race conditions addressed by the previous levels. It is important to note that each database engine implements these levels differently.
| Dirty Read | Non Repeatable Read | Write Skew (Phantom) | |
|---|---|---|---|
| Read Uncommitted | X | X | X |
| Read Committed | X | X | |
| Repeatable Read | X | X | |
| Serializable |
Read Committed
The lowest of the isolation levels that provides the following guarantees:
- All reads operate on data that has been fully committed.
No dirty reads. - All writes operate on data that has been fully committed.
No dirty writes.
Without these guarantees, the level is referred to as Read Uncommitted.
Dirty Reads
In a dirty read, a transaction reads the uncommitted changes of ongoing transactions. This allows the reading of intermediate states. If any of these transactions then fail, data that should never have existed would have been read.
Dirty Writes
In a dirty write, an update is made to the uncommitted changes of a concurrently running transaction. This poses a problem because, depending on the timing, only parts of each transaction might be applied.
Example:
- Buying a car requires updates to two tables — the listing and the invoice. Two interested parties, A and B, try to
buy the same car at the exact same time.
- Transaction A updates the listing, but then briefly pauses (this may happen due to a CPU context switch).
- Transaction B overwrites A’s update on the listing.
- Transaction B updates the invoice.
- Transaction A resumes and overwrites the invoice.
- Buyer B has purchased the car (last write operation), but buyer A receives the invoice.
Implementation
To implement Read Committed, PostgreSQL, for example, uses Multi-Version Concurrency Control (MVCC) based on timestamps. Transactions only read data that was written before their start.
Repeatable Read
The isolation level above Read Committed, which additionally prevents nonrepeatable reads. Also referred to as Snapshot Isolation because each transaction operates on its own snapshot of the database.
Nonrepeatable Reads (or Read Skew)
A nonrepeatable read (also known as read skew) occurs when an aggregate function (such as SUM) is applied to a range of rows that change during its computation — particularly entries that have already been read. This affects inserts, updates, and deletes equally. If re-executing the function yields a different result, it is considered a nonrepeatable read.
Example:
| ID | Salary |
|---|---|
| 1 | 1000 |
| 2 | 2000 |
| 3 | 3000 |
| 4 | 2500 |
| 5 | 1000 |
A session executes a SUM over the salaries. Meanwhile, the employee with ID 3 is deleted — at the moment the computation reaches ID 4. As a result, an incorrect total salary is calculated.
Nonrepeatable reads are especially dangerous for long-running processes that rely on data integrity, such as backups, analytic workloads, and integrity checks.
Serializable
The highest isolation level, where transactions are executed sequentially — at least according to the standard. In reality, databases deviate from this. For example, PostgreSQL uses monitoring to detect conflicts between concurrently running sessions. In the event of a conflict, one of the two transactions is aborted.
Sequential execution prevents lost updates, write skews, and phantoms. However, these issues can also be avoided through proper locking.
Lost Updates
A lost update is a read-modify race condition on a shared row. Here, one session reads a value as input for a calculation and then updates it. Meanwhile, a parallel session updates the same value between the read and write operations. This intermediate update is lost.
Example:
| ID | Value |
|---|---|
| 1 | 3 |
- Session A reads the value 3 and intends to increase it by 2.
- Between read and the write, a parallel session writes the value 4.
- When Session A writes the value 5, the update to 4 is lost.
Solutions include serialized execution, locking, and atomic operations, although the use of atomic operations is limited to specific cases.
Write Skew (Phantoms)
A write skew is a race condition involving reads on shared entries and writes on separate entries. This applies equally to inserts, updates, and deletes.
Example (1) — materialized:
- A hospital’s shift plan requires that at least one doctor is on call at all times. For one evening, two doctors (A) and (B) are scheduled. Both wish to sign off.
- (A) and (B) attempt to sign off at the same time. The system checks in parallel whether at least one doctor remains on call — which is true in both cases. Both are removed concurrently.
- As a result, no doctor remains on call.
Example (2) — unmaterialized:
- Booking a meeting room is handled through time slots; stored as entries with a start and end time, assigned to a room and a responsible person.
- Two people (A) and (B) try to book the room at the same time. The system checks if a booking exists for the requested time slot — in both cases, none is found. Two new entries are created simultaneously.
The difference between Example (1) and Example (2) is that in (1) the conflict is materialized, while in (2) it is not. This means in (1) there are existing entries that could be locked; in (2) there are no entries yet — and you cannot lock what doesn’t exist.
Solutions include serialized execution and locks. However, locks require a materialized conflict — or they must be applied to the entire table.
Postgres Specifics
Repeatable Read
In PostgreSQL, Repeatable Read is implemented as snapshot isolation using MVCC. Each transaction sees a snapshot of the database taken at the moment it starts. The following points are important:
- Locking
Locks are applied independently of versions. Only entries visible in the transaction’s snapshot are considered — newer entries are ignored. - Exception Handling
When executing updates, deletes, or applying locks, an exception is thrown if a newer version of the affected entries exists outside the snapshot. Read-only queries without locking are not affected.
On the application side, proper error handling must involve retrying the entire transaction in order to work with a newer snapshot.
Serializable
This isolation level extends the snapshot isolation of Repeatable Read by adding conflict monitoring through predicate locks. True serialization is not achieved; instead, an optimistic approach is used, resolving conflicts by aborting transactions when they are detected.
Access to entries is recorded as predicate locks — visible in pg_locks. If a conflict arises, one of the involved transactions is aborted with an error message. Predicate locks do not play a role in deadlocks!
Important points when using Serializable:
- Exception Handling and Consistency
Applications must implement error handling for serialization exceptions by retrying the entire transaction. - Read Consistency
Reads are only considered consistent after a successful commit because a transaction may still be aborted at any time.
This does not apply to read-only transactions. Their reads are consistent from the moment a snapshot is established; these transactions must be flagged as read-only. - Locking
Explicit locking becomes unnecessary when Serializable is used globally. In fact, for performance reasons, explicit locks should be avoided! - Mixing Isolation Levels
Conflict monitoring only applies to transactions running under Serializable. Therefore, it is recommended to set the isolation level globally.
Important:
Keep in mind that sequential scans and long-running transactions can lead to a high number of aborts as the
number of concurrent users increases. Serializable works best with small, fast transactions.