Trading ADRs
1. Use Markdown Architectural Decision Records
- Status: proposed
- Date: 2026-01-05
Context and Problem Statement
We want to record architectural decisions made in this project. Which format and structure should these records follow?
Decision Drivers
- Low barrier of entry - since the act of documenting is always cumbersome, the format should not burden you any more than it needs to. E.g. we do not necessarily need an 11-Point Document for each and every decision.
- Overview - an ADR should provide a list of considered options and why you chose one of them while discarding others.
- Tool support - there is a strong need to integrate ADRs with our central (hugo based) documentation, as natively as possible. Previous projects have shown that scattering relevant documentation leads to a lot of confusion.
Considered Options
- MADR 2.1.0 - The Markdown Architectural Decision Records
- Michael Nygard’s template - The first incarnation of the term “ADR”
- Sustainable Architectural Decisions - The Y-Statements
- Other templates listed at https://github.com/joelparkerhenderson/architecture_decision_record
- Formless - No conventions for file format and structure
Decision Outcome
Chosen option: “MADR 2.1.0”, because
- Implicit assumptions should be made explicit. Design documentation is important to enable people understanding the decisions later on. See also A rational design process: How and why to fake it.
- The MADR format is lean and fits our development style.
- The MADR structure is comprehensible and facilitates usage & maintenance.
- The MADR project is vivid.
- Version 2.1.0 is the latest one available when starting to document ADRs.
Pros and Cons of the Options
MADR
(This whole Document is an Example)
- Good, because a few mandatory concise fields
- Good, because optional fields provide more information about reasoning and weighing different options
- Good, allows for linking to different ADRs
Y-Statement
In the context of capturing architecture decisions, facing tedious verbosity, scattered documentation and knowledge drain from people leaving the project, we decided to implement ADRs using the MADR template and neglected other formats like Y-Statement, Nygard and similar to achieve a simple, lightweight and modular approach to capturing ADs, accepting the fewer guardrails but more verbosity than y-statement, because our decisions are not overly complex and decided within days-weeks instead of months but still need thorough consideration of multiple options in a structured manner.
- Good, because it results in brief ADRs
- Good, because it allows capturing existing decisions rather quickly
- Bad, because options are not considered in depth
- Bad, because condensing decisions this much requires more thought, not less
- Bad, because it lacks information for future readers that are new to the project
NygardADR
- Bad, few fields for weighing options
- Bad, ambiguous fields might lead to vastly different focal points when making decisions
Links
You can add links via the adr tool e.g.:
adr link 1 Amends 0 "Amended by"
or while creating a new one:
adr new -l "0:Amends:Amended by" Use Markdown Architectural Decision Records.
The other option is to use manual linking like this:
[ADR0001]({{< relref "docs/architecture/adr/0000-empty-adr-template.md" >}}) produces:
ADR0001
Or use normal Markdown links (also supports relative linking): [\<Link Name\>]\(docs/architecture/adr/0000-empty-adr-template.md) - Produces: Link Name
2. Violating the One TX per Aggregate rule - Case BuyVoucher
- Status: accpeted
- Date: 2026-02-11
Context and Problem Statement
in the current domain model, Voucher and BankAccount are separate aggregates. To maintain atomicity when purchasing a voucher, the application currently modifies both within a single transaction, which technically violates the Domain-Driven Design (DDD) principle of one aggregate per transaction. The general concept can be applied to similar situation in other services.
Decision Drivers
- Code clarity & Maintainability - new developers need to understand what is happening.
- Scalability - The need to avoid database deadlocks during high-volume voucher purchases.
- Data consistency - ACID Guarantees for critical paths.
- Resource Constraints - Time for implementation is not unlimited
Considered Options
- Pragmatic Transaction - Modifying both BankAccount and Voucher within a single database transaction
- Combined Aggregate - Combine Voucher and BankAccount into one aggregate
- Saga /Process Manager - Use a form of process manager that orchestrates / choreographs the process of buying a voucher. 1st TX: Withdraw money & set status “Payment Pending”. 2. Send Event. 3rd TX: Create Voucher & set status “Completed”.
- Internal Eventing - First update the BankAccount and create the Voucher asynchronously via internal spring events. Withdraw money, then trigger an internal event for a handler to create the voucher.
Decision Outcome
Chosen option: “Accept Violation”, because it seems to score best on most decision drivers (see below). We do need immediate consistency, as potential abuse risk of eventual consistency is high when issuing vouchers. And combining the two aggregates does not make sense from a domain perspective. At a later stage in development, or other cases this decision might be concluded differently.
Pros and Cons of the Options
| Solution | Pros | Cons |
|---|---|---|
| Pragmatic Transaction | Easy to implement; immediate consistency; highly readable code. | Violates DDD rules; couples aggregates at the DB level; potential locking issues under high load. |
| Saga / Process Manager | Scalable; clean decoupling; strictly adheres to DDD principles. | High implementation effort; requires resilience logic (retries) for crashes; eventual consistency. |
| Aggregate Redesign | Solves the transaction problem in a rule-compliant way. | Risk of “God-Aggregates”; might lead to unnatural domain grouping. |
| Internal Eventing | Decouples logic within the code. | Dangerous: Without Outbox Pattern/persistent events, risk of data loss (money gone, no voucher) if the system crashes after the 1st TX. |
Both, internal eventing and Saga have one pro in common: They both allow for easy extraction of Bank or Voucher service later in development.
3. Existence of robots is not validated by Trading
- Status: accpeted
- Date: 2026-02-11
Context and Problem Statement
In the trading service the player is able to purchase vouchers, either to construct new, or upgrade existing robots. However the trading service does not maintain a copy of all robots in posession of players. This means, that it cannot verify if the robot to be upgraded is in the posession of the player issuing the intent. It also cannot verify if a robot died before an upgrade could be applied.
Considered Options
- Ignore Existence - We can simply state in the rules, that it is the players job to keep track of their robots and if they purchase a component for a nonexisting robot, it’s their fault.
- Check Existence ECST - We could synchronise all Robots via ECST and check if that robot a) exists and b) is owned by the palyer.
- Derive Ownership - We could construct a list of Construction vouchers purchased and approximate a mistake.
Decision Outcome
Chosen option: Ignore Existence because deriving the ownership does only prevent a player from upgrading a robot that never existed, but not a robot that has died. And the ECST appraoch, while theoretically possible, introduced another eventual consistency situation that would have to be rolled back in case of failure. But it would be possible to issue refunds using a saga pattern, by listening to failure events from the robot service.
4. PlayerId is not authenticated
- Status: preliminary decision
- Date: 2026-02-12
Context and Problem Statement
When processing Intents, the player usually supplies their playerId in form of a kafka key. Alternatively that value is in the headers of each intent. This playerId is fully controlled by the player and cannot be verified. This allows other players to impersonate other players and issue intents on their behalf.
Decision Drivers
- Performance
- Fairness
- Complexity
Considered Options
- Establish Gentlemen’s Agreement - Tell all participants that this is strictly forbidden / obvious and exclude deviants from the workshop.
- Verify using JWTs - Something something keycloak, verify JWT on every intent.
- Use shared secret - When joining a game, each player somehow gets a secret which is used when sending intents. Needs strict ACLs on Kafka topics.
Decision Outcome
Chosen option: Establish Gentlemen’s Agreement, as this is the easiest option and all other solutions are way too complex for now. JWTs and other crypto methods are not really an option, since the intent volume is expected to be very high.
Positive Consequences:
- Reduced Complexity & increased performance by not implementing complex auth mechanisms per intent.
Negative consequences:
- Nothing hinders a player from:
- Impersonating other players (issuing new purchases, draining their account for invalid upgrades/vouchers)
- Increasing sequence_numbers for players so they cannot issue intents anymore → Theoretically requires idempotency by using eventIds
5. Event categories and replication strategy in Trading
- Status: accepted
- Date: 2026-02-12
Context and Problem Statement
The trading service exchanges several types of events with other services and players. These cover both business actions and replicated state. A consistent categorization is required to keep responsibilities clear and to avoid mixing state replication with business signaling.
Considered Options
- Use a single generic category of events for both state replication and business actions.
- Use only state replication events (event-carried state transfer) and let consumers derive actions from state changes.
- Use only domain events and derive any needed replicated state from them.
- Separate events into different categories for intents, business domain events, replicated state, and dead-letter handling.
Decision Outcome
Chosen option: “Separate event categories”.
The trading service uses distinct categories of events:
- Intents from players to express desired actions.
- Domain events that represent completed business actions.
- Event-carried state transfer events to replicate aggregate state to other services.
- Dead-letter topics for failed processing.
This separation keeps concerns clearer for both producers and consumers, at the cost of more topics and mapping logic.
6. Error handling with exceptions and dead-letter topics
- Status: accepted
- Date: 2026-02-12
Context and Problem Statement
The trading service needs a consistent approach for handling errors in domain logic and technical processing, especially when reacting to incoming events. Different error types require different handling and visibility for players and operators.
Considered Options
- Represent errors explicitly in domain APIs using result types instead of exceptions.
- Use checked or unchecked exceptions in domain logic, with per-use-case handling.
- Use exceptions everywhere and catch in outer layers to route failed messages to dead-letter topics based on error type.
Decision Outcome
Chosen option: “Use exceptions everywhere with dead-letter topics”.
Domain and application code signal errors using exceptions. Listener components classify errors and route failed messages to:
- A dead-letter topic intended for player-visible problems, such as invalid intents or insufficient credits.
- A separate dead-letter topic for unexpected technical failures that require operational follow-up.
Usually this is done via the KafkaErrorHandler as it keeps domain APIs simple while standardizing error handling. However, there is currently an exception that needs to be discussed further: The VoucherIntentListener differentiates between the three types of DomainExceptions, InvalidIntentExceptions, JsonProcessingExceptions, and all other Exceptions. Domain and processing exceptions that occur when processing the payload are handed to the VoucherEventHandler, that instead tries to publish a specific VoucherFailureEvent with as much info as possible. This would be tricky and more verbose to handle via a kafka error handler, but is theoretically possible. One drawback of the KafkaErrorHandler is, that it is tricky to open a transaction context there, so there is no outbox for failure events published this way.
7. Validation of game and player existence in Trading
- Status: accepted
- Date: 2026-02-12
Context and Problem Statement
When opening bank accounts or processing purchases, the trading service works with game and player identifiers that originate from other services. It is unclear whether the trading service should synchronously validate that the referenced game and player actually exist and are related.
Considered Options
- Perform synchronous calls to the game and player services to validate identifiers before processing.
- Use replicated game and player state and check against that local view.
- Trust the upstream events and identifiers and avoid additional existence checks in trading.
Decision Outcome
Chosen option: “Trust upstream identifiers”.
The trading service does not perform additional existence checks for games and players when opening bank accounts or handling purchases. It relies on:
- Core events from the game service to indicate that a player joined a game.
- The assumption that upstream services only emit consistent identifiers.
This reduces coupling and avoids extra latency, while accepting that inconsistencies can occur if upstream services misbehave.
9. Domain coupling to outbox for event publication
- Status: accepted
- Date: 2026-02-12
Context and Problem Statement
The trading service needs to publish events reliably whenever domain state changes, especially for bank accounts and vouchers. A strict separation between domain logic and infrastructure would suggest that domain code should not depend on event publishing concerns, but this can complicate ensuring atomicity between state changes and outgoing events.
Considered Options
- Keep domain code free of any dependencies on event publication and let infrastructure observe and publish domain events.
- Allow domain or application services to depend on an abstraction for writing to the outbox, executed within the same transaction as state changes.
Decision Outcome
Chosen option: “Use an outbox abstraction from domain and application services”.
Domain and application code use an abstraction to write events to a transactional outbox as part of the same transaction that modifies domain state. This creates a deliberate dependency from domain logic to an infrastructure concern, in exchange for:
- Atomic publication of events together with state changes.
- Simplified reasoning about when events are written.
The trade-off of relaxing strict architectural boundaries is considered acceptable for this service.
9. Spawn position is not validated by Trading
- Status: accepted
- Date: 2026-03-29
Context and Problem Statement
When processing intents by the player to construct / spawn a robot, the trading service might receive a spawn position in the form of a coordinate. However, the trading service does not maintain a copy of the game map and thus cannot validate if the spawn position is valid, e.g. if it is within the map boundaries or if it a space station. This makes it possible for players to issue intents with invalid spawn positions, which would then lead to (yet) undefined consequences in the robot service when the voucher is applied.
Considered Options
- Store a copy of the game map in the trading service and validate spawn positions against it.
- Trust the player / let robot handle invalid spawn positions. The Robot service has the freedom of ignoring an invalid spawn position and spawn the robot at a default position, or to reject the voucher.
Decision Outcome
Chosen option: “Trust the player / let robot handle invalid spawn positions”. As a consequence, should robot reject the voucher, players lose money. Currently this is acceptable, but might be subject to change in the future, e.g. by introducing a refund mechanism for failed vouchers. The main reasoning for this decision is that maintaining a copy of the game map in the trading service would introduce additional complexity and coupling, while the current approach allows for simpler processing of spawn intents at the cost of potential player errors.
Links
Similar to the decision about validating the existence of robots:
[ADR0009]({{</* relref “docs/architecture/adr/0003-existence-of-robots-is-not-validated-by-trading.md” */>}}