This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Knowledge Base

A collection of selected topics that can support further development or the creation of a custom player.
  • 1:
  • 2:
  • 3:

A collection of selected topics that can support further development or the creation of a custom player.


Event Driven Architectures

Databases

1 -

Idempotent Event Consumer

Consumers must be able to process events idempotently. In general, an at least once delivery guarantee is assumed. This means that a consumer may receive the same event multiple times.

There are several reasons for this:

  • Typically, consumers receive and acknowledge events in batches. If processing is interrupted, the same batch will be delivered again. If some events in the batch have already been processed, a non-idempotent consumer will process them again.

  • Some brokers (including Kafka) support transactions, allowing receiving, acknowledging, and sending new events to occur atomically. However, it’s not that simple:

    • First, these transactions are very costly and lead to significantly higher latencies. In larger systems, those costs add up. These costs arise because processing no longer happens in batches as each event is transmitted individually, along with additional calls similar to a two-phase commit.

    • Second, this approach only works as long as a single technology is used. Otherwise, a transaction manager is required - the difference between local and global transactions. Else, events may be lost or published multiple times — bringing us back to the original problem. (See atomic event publication)

Idempotent Business Logic

In most (or even all?) cases, business logic can be made idempotent. It’s important to distinguish between absolute and relative state changes.

  • Absolute state changes:
    With an absolute state change, a value is typically replaced by a new one — such as an address. These operations are always idempotent.

  • Relative state changes:
    A relative state change occurs when, for example, an amount is deducted from an account balance. Reprocessing the event would alter the balance again. Alternatively, an account balance can be modeled as a sequence of transactions. If the event ID is unique and part of the SQL schema, reprocessing would result in an error — this is essentially event sourcing.

Remembering Processed Events

Alternatively, the IDs of processed events can be stored in the database. Before processing, a lookup can be performed to check whether the event has already been handled.

For consistency, it’s crucial that storing the event ID happens within the same transaction as all other domain changes.

A prerequisite for a simple solution without locking is that the same event cannot be processed in parallel by different consumers. With brokers like Kafka, this is guaranteed — there can only be one consumer per partition.

Transactional Event Processing

Another alternative is the use of local (or global) transactions, accepting the associated drawbacks — assuming the broker supports transactions at all.

It’s important to note that when using multiple technologies, a transaction manager is required. Local transactions only work within a single technology — in this case the broker. That’s why exactly once processing works in Kafka Streams, where Kafka supports the role of a database as well.

2 -

Atomic Event Publication

Every change in a domain leads to the publication of an event in an event-driven architecture.

For the overall system to remain consistent, it would be ideal if the change and the publication occurred atomically within the same transaction. This would provide the highest possible level of consistency.

However, this is not easily achievable as database and broker are typically separate systems. To enable a global transaction across both, a transaction manager would be needed.

💡 A local transaction refers to a transaction within a single system (or technology), such as a database. When multiple systems are involved, separate local transactions are created independently. What’s missing is a mechanism to coordinate them. That’s where the transaction manager comes in — a separate system that synchronizes local transactions into a global transaction. This is the only way to achieve exactly once semantics across systems. All other approaches relax this guarantee.

The problem is very similar to ACID and CAP considerations in distributed systems, where trade-offs are made between different properties—or guarantees.

What are these guarantees?

  • Ordering Guarantees
    A guarantee regarding the order of processing—whether it must match the original processing sequence or not.

  • Delivery / Publication Guarantees
    A guarantee regarding the delivery of events—whether events may be delivered multiple times, exactly once, or at most once.

  • Read Your Writes
    A guarantee about immediate consistency — i.e., whether a system can read its own changes right after writing them.

It’s important to note that some guarantees exist along a spectrum. The theory is much more detailed, so this section serves merely as an introduction. There are three approaches to solving this problem. Each fulfills different guarantees and comes with its own trade-offs.

Transaction Manager

A transaction manager coordinates local transactions across independent systems — essentially acting as the orchestrator of a two-phase commit (2PC).

exactly once in-order - A transaction manager provides by far the highest level of consistency between two systems. For this reason, it is often used in banking systems.

Disadvantages

  • Performance
    A transaction manager introduces significant overhead, as it is a separate service that orchestrates a 2PC between systems. In Kafka, for example, using local transactions would also mean that events could no longer be processed in batches.

  • Cost
    Transaction managers are often commercial products — there are few, if any, free alternatives.

  • Support
    The involved systems must implement certain standards to integrate with a transaction manager.

Transactional Outbox Pattern

With a transactional outbox, the problem of independent transactions is deferred by storing events in the database as part of the same local transaction. A separate process reads and publishes these events in order.

at least once in-order – The storage of events happens exactly once, while their publication is at least once. The problem of independent transactions is effectively shifted by this pattern. As long as reading and publishing do not happen concurrently, the event order is preserved.

Types of Transactional Outboxes

  • Polling Outbox
    The database is regularly polled for new events. This approach has higher latency and significantly increased resource consumption.

  • Tailing or Subscriber Outbox
    This type subscribes to a change stream of the database. Databases like Redis and MongoDB offer built-in subscriber mechanisms. For others, the write-ahead log can be used to achieve the same effect.

Further Considerations and Challenges

  • Scaling
    As throughput increases, scaling and latency can become issues. Depending on the context, sharding may help—but it’s important that the outbox sharding aligns with the broker’s sharding. Otherwise, event ordering may break.

  • Single Point of Failure
    A single outbox instance represents a single point of failure. Therefore, a cluster with leader election is needed to ensure high availability and low latency.

Event Persistence

Similar to the outbox pattern, events are stored in the database along with an additional status field. However, unlike the outbox, events are published by the same process followed by a status update.

An additional process periodically scans the database for unpublished events and publishes them as a fallback.

at least once out of order – Due to the nature of the publishing mechanism, maintaining the correct order is a best effort.

💡 This is by the way the mechanism Spring Modulith uses for its internal eventing. Something to keep in mind.

Separate Transaction

Another option is to more or less ignore the problem.

at most once in-order – Both local transactions would — if possible — be nested, so that a failure in the inner transaction leads to a rollback of the outer one. However, if the outer transaction fails after the inner one has been executed, this will inevitably lead to inconsistency.

💡 This is exactly what happens when using @Transactional in Spring-Kafka and Spring-Data without an external transaction manager.

3 -

Database Isolation Levels

During parallel execution of transactions, a variety of race conditions can occur. Transaction isolation levels are methods used to prevent these. Each level is characterized by the type of race condition it prevents. Moreover, each level also prevents all the race conditions addressed by the previous levels. It is important to note that each database engine implements these levels differently.

Dirty Read Non Repeatable Read Write Skew (Phantom)
Read Uncommitted X X X
Read Committed X X
Repeatable Read X X
Serializable

Table of contents


Read Committed

The lowest of the isolation levels that provides the following guarantees:

  1. All reads operate on data that has been fully committed.
    No dirty reads.
  2. All writes operate on data that has been fully committed.
    No dirty writes.

Without these guarantees, the level is referred to as Read Uncommitted.

Dirty Reads

In a dirty read, a transaction reads the uncommitted changes of ongoing transactions. This allows the reading of intermediate states. If any of these transactions then fail, data that should never have existed would have been read.

Dirty Writes

In a dirty write, an update is made to the uncommitted changes of a concurrently running transaction. This poses a problem because, depending on the timing, only parts of each transaction might be applied.

Example:

  • Buying a car requires updates to two tables — the listing and the invoice. Two interested parties, A and B, try to buy the same car at the exact same time.
    • Transaction A updates the listing, but then briefly pauses (this may happen due to a CPU context switch).
    • Transaction B overwrites A’s update on the listing.
    • Transaction B updates the invoice.
    • Transaction A resumes and overwrites the invoice.
  • Buyer B has purchased the car (last write operation), but buyer A receives the invoice.

To implement Read Committed, PostgreSQL, for example, uses Multi-Version Concurrency Control (MVCC) based on timestamps. Transactions only read data that was written before their start.


Repeatable Read

The isolation level above Read Committed, which additionally prevents nonrepeatable reads. Also referred to as Snapshot Isolation because each transaction operates on its own snapshot of the database.

Nonrepeatable Reads (or Read Skew)

A nonrepeatable read (also known as read skew) occurs when an aggregate function (such as SUM) is applied to a range of rows that change during its computation — particularly entries that have already been read. This affects inserts, updates, and deletes equally. If re-executing the function yields a different result, it is considered a nonrepeatable read.

Example:

ID Salary
1 1000
2 2000
3 3000
4 2500
5 1000

A session executes a SUM over the salaries. Meanwhile, the employee with ID 3 is deleted — at the moment the computation reaches ID 4. As a result, an incorrect total salary is calculated.

Nonrepeatable reads are especially dangerous for long-running processes that rely on data integrity, such as backups, analytic workloads, and integrity checks.


Serializable

The highest isolation level, where transactions are executed sequentially — at least according to the standard. In reality, databases deviate from this. For example, PostgreSQL uses monitoring to detect conflicts between concurrently running sessions. In the event of a conflict, one of the two transactions is aborted.

Sequential execution prevents lost updates, write skews, and phantoms. However, these issues can also be avoided through proper locking.

Lost Updates

A lost update is a read-modify race condition on a shared row. Here, one session reads a value as input for a calculation and then updates it. Meanwhile, a parallel session updates the same value between the read and write operations. This intermediate update is lost.

Example:

  • Session A reads the value 3 and intends to increase it by 2.
  • Between read and the write, a parallel session writes the value 4.
  • When Session A writes the value 5, the update to 4 is lost.

Solutions include serialized execution, locking, and atomic operations, although the use of atomic operations is limited to specific cases.

Write Skew (Phantoms)

A write skew is a race condition involving reads on shared entries and writes on separate entries. This applies equally to inserts, updates, and deletes.

Example (1) — materialized:

  • A hospital’s shift plan requires that at least one doctor is on call at all times. For one evening, two doctors (A) and (B) are scheduled. Both wish to sign off.
  • (A) and (B) attempt to sign off at the same time. The system checks in parallel whether at least one doctor remains on call — which is true in both cases. Both are removed concurrently.
  • As a result, no doctor remains on call.

Example (2) — unmaterialized:

  • Booking a meeting room is handled through time slots; stored as entries with a start and end time, assigned to a room and a responsible person.
  • Two people (A) and (B) try to book the room at the same time. The system checks if a booking exists for the requested time slot — in both cases, none is found. Two new entries are created simultaneously.

The difference between Example (1) and Example (2) is that in (1) the conflict is materialized, while in (2) it is not. This means in (1) there are existing entries that could be locked; in (2) there are no entries yet — and you cannot lock what doesn’t exist.

Solutions include serialized execution and locks. However, locks require a materialized conflict — or they must be applied to the entire table.


Postgres Specifics

Repeatable Read

In PostgreSQL, Repeatable Read is implemented as snapshot isolation using MVCC. Each transaction sees a snapshot of the database taken at the moment it starts. The following points are important:

  • Locking
    Locks are applied independently of versions. Only entries visible in the transaction’s snapshot are considered — newer entries are ignored.
  • Exception Handling
    When executing updates, deletes, or applying locks, an exception is thrown if a newer version of the affected entries exists outside the snapshot. Read-only queries without locking are not affected.
    On the application side, proper error handling must involve retrying the entire transaction in order to work with a newer snapshot.

Serializable

This isolation level extends the snapshot isolation of Repeatable Read by adding conflict monitoring through predicate locks. True serialization is not achieved; instead, an optimistic approach is used, resolving conflicts by aborting transactions when they are detected.

Access to entries is recorded as predicate locks — visible in pg_locks. If a conflict arises, one of the involved transactions is aborted with an error message. Predicate locks do not play a role in deadlocks!

Important points when using Serializable:

  • Exception Handling and Consistency
    Applications must implement error handling for serialization exceptions by retrying the entire transaction.
  • Read Consistency
    Reads are only considered consistent after a successful commit because a transaction may still be aborted at any time.
    This does not apply to read-only transactions. Their reads are consistent from the moment a snapshot is established; these transactions must be flagged as read-only.
  • Locking
    Explicit locking becomes unnecessary when Serializable is used globally. In fact, for performance reasons, explicit locks should be avoided!
  • Mixing Isolation Levels
    Conflict monitoring only applies to transactions running under Serializable. Therefore, it is recommended to set the isolation level globally.

Important:
Keep in mind that sequential scans and long-running transactions can lead to a high number of aborts as the number of concurrent users increases. Serializable works best with small, fast transactions.