Atomic Event Publication
Every change in a domain leads to the publication of an event in an event-driven architecture.
For the overall system to remain consistent, it would be ideal if the change and the publication occurred atomically within the same transaction. This would provide the highest possible level of consistency.
However, this is not easily achievable as database and broker are typically separate systems. To enable a global transaction across both, a transaction manager would be needed.
💡 A local transaction refers to a transaction within a single system (or technology), such as a database. When multiple systems are involved, separate local transactions are created independently. What’s missing is a mechanism to coordinate them. That’s where the transaction manager comes in — a separate system that synchronizes local transactions into a global transaction. This is the only way to achieve exactly once semantics across systems. All other approaches relax this guarantee.
The problem is very similar to ACID and CAP considerations in distributed systems, where trade-offs are made between different properties—or guarantees.
What are these guarantees?
-
Ordering Guarantees
A guarantee regarding the order of processing—whether it must match the original processing sequence or not. -
Delivery / Publication Guarantees
A guarantee regarding the delivery of events—whether events may be delivered multiple times, exactly once, or at most once. -
Read Your Writes
A guarantee about immediate consistency — i.e., whether a system can read its own changes right after writing them.
It’s important to note that some guarantees exist along a spectrum. The theory is much more detailed, so this section serves merely as an introduction. There are three approaches to solving this problem. Each fulfills different guarantees and comes with its own trade-offs.
Transaction Manager
A transaction manager coordinates local transactions across independent systems — essentially acting as the orchestrator of a two-phase commit (2PC).
exactly once in-order - A transaction manager provides by far the highest level of consistency between two systems. For this reason, it is often used in banking systems.
Disadvantages
-
Performance
A transaction manager introduces significant overhead, as it is a separate service that orchestrates a 2PC between systems. In Kafka, for example, using local transactions would also mean that events could no longer be processed in batches. -
Cost
Transaction managers are often commercial products — there are few, if any, free alternatives. -
Support
The involved systems must implement certain standards to integrate with a transaction manager.
Transactional Outbox Pattern
With a transactional outbox, the problem of independent transactions is deferred by storing events in the database as part of the same local transaction. A separate process reads and publishes these events in order.
at least once in-order – The storage of events happens exactly once, while their publication is at least once. The problem of independent transactions is effectively shifted by this pattern. As long as reading and publishing do not happen concurrently, the event order is preserved.
Types of Transactional Outboxes
-
Polling Outbox
The database is regularly polled for new events. This approach has higher latency and significantly increased resource consumption. -
Tailing or Subscriber Outbox
This type subscribes to a change stream of the database. Databases like Redis and MongoDB offer built-in subscriber mechanisms. For others, the write-ahead log can be used to achieve the same effect.
Further Considerations and Challenges
-
Scaling
As throughput increases, scaling and latency can become issues. Depending on the context, sharding may help—but it’s important that the outbox sharding aligns with the broker’s sharding. Otherwise, event ordering may break. -
Single Point of Failure
A single outbox instance represents a single point of failure. Therefore, a cluster with leader election is needed to ensure high availability and low latency.
Event Persistence
Similar to the outbox pattern, events are stored in the database along with an additional status field. However, unlike the outbox, events are published by the same process followed by a status update.
An additional process periodically scans the database for unpublished events and publishes them as a fallback.
at least once out of order – Due to the nature of the publishing mechanism, maintaining the correct order is a best effort.
💡 This is by the way the mechanism Spring Modulith uses for its internal eventing. Something to keep in mind.
Separate Transaction
Another option is to more or less ignore the problem.
at most once in-order – Both local transactions would — if possible — be nested, so that a failure in the inner transaction leads to a rollback of the outer one. However, if the outer transaction fails after the inner one has been executed, this will inevitably lead to inconsistency.
💡 This is exactly what happens when using @Transactional in Spring-Kafka and Spring-Data without an external transaction manager.