1 - About Microservice Dungeon 2.0

Welcome to the Microservice Dungeon! – or simply MSD.

(tbd - elaborate, give some history, team, funding, papers, …)

2 - Game Rules

The following section provides an overview of the game rules. If you’re wondering what the MSD is, check out our About page.

tl;dr

The MSD is a game where player services compete to achieve the highest possible score. Each player service controls a swarm of robot. Robots move across the game board, mine resources, purchase upgrades, and battle each other. Most actions earn points.

The map is a 2-dimensional grid of star systems. Resources are distributed across it. They can be mined, picked up by robots, and sold at space stations. Space stations are also placed on the map and serve as a kind of home base and safe zone. There, robots can sell resources, get repairs, and purchase upgrades. Combat is not allowed on space stations.

Each player has a money account, and starts the game with an initial amount of money. This can be used to purchase new robots. New robots always start at the player’s allocated space station. Selling resources and killing robots increases a player’s account balance. With more money, a player can buy additional robots or upgrade existing ones.

Map

Each game takes place on a map, visible on start for players. The map itself is a stationary grid consisting of tiles - similar to a chessboard. Each tile has up to eight adjacent neighbors, except at the edges of the map. The positions of the tiles are stationary, meaning they do not move.

Map-Resources

The map for an MSD game is configurable. There are a couple of standard map types, and it is possible to add other types as well for dedicated games. Please refer to the map details page for an in-depth explanation of what elements a map consists of.

Games

The MSD is played in individual games. To participate, players register once and join an open game. This is possible even after the game has started, but late joiners do not receive any compensation for missed playtime. The number of participants is limited and defined at game creation.

Win Condition

A game ends either when an administrator intervenes, or when the predefined game time runs out. The player with the most points wins the game, though there are several categories to score a ranking.

Once a player has joined a game, they cannot leave. If they lose all their robots and have no funds left to purchase new ones, their game ends.

Robots

Every player controls robots to compete against other players.

A robot is exactly what you’d expect — a mechanical unit with health points, energy for performing actions, having the ability to move, engage in combat, and upgrade itself through the purchase of upgrades.

Buying a Robot

Players can purchase new robots at any time during the game using their money. Newly bought robots spawn instantly at a space station dedicated to a player or by choice.

Action-Cooldown / -Queue [to be discussed]

After performing an action, a robot requires a short pause before executing the next one. This cooldown applies regardless of whether the action was successful or not. As a result, robots may not respond immediately to new commands.

Robots queue up actions and execute them in order. Players should carefully plan the number of commands they issue, as each action has a different cooldown duration. For instance, attacking another robot has a shorter cooldown than moving. Upgrades are available to reduce cooldown durations.

External actions do not trigger a cooldown, such as applying upgrades or collecting resources.

This mechanism is similar to the mining system, with one key difference: Robots execute actions immediately, followed by a cooldown. Mines require processing time before yielding resources.

Energy

Robots have both health points and energy. Some actions consume energy, and if a robot runs out of energy, it pauses and cannot perform further actions until it has recharged enough energy.

Energy automatically regenerates over time. This process is increased on space stations.

Repairing

Robots automatically restore health points over time on space stations. No action required.

Movement

Robots can move horizontally, vertically, and diagonally across the map. Each movement consumes energy and triggers a cooldown. To track enemy robot positions, players must listen to the movements of other robots.

Fighting [to be discussed]

Robots can attack other robots. Attacks consume energy and trigger a cooldown, regardless of whether the attack is successful or not. To attack, a robot must have enough energy and be in range of the target. Caution, friendly fire is possible!

When an enemy robot is destroyed, the attacker becomes stronger, receives a financial reward and collects the destroyed robot’s resources, regardless of its position.

Mining [to be discuessed]

To extract resources, a player starts the mining process at a specific mine. If the player’s robot is on the same planet, it will automatically collect the resources once the mining process finishes.

Starting the mining process consumes energy and triggers a cooldown - even if the mining fails or the robot is elsewhere. Collecting mined resources does not consume energy nor does it trigger a cooldown. But in order to work, the robot must be present on the planet when the mining completes.

Trading

Trading is an essential part of the MSD and takes place at space stations. Players can sell mined resources, purchase new robots, and upgrade existing ones — all in exchange for in-game currency.

Selling Resources

Resources collected by robots can be sold at any space station. To initiate a sale, a robot carrying resources must be present at a space station. Once the sale is started, all resources on that robot are sold to the market, and the player receives the corresponding amount of money in their account.

The value of each resource depends on its rarity. Currently, prices are hardcoded and fix, but may fluctuate (based on market demand) in the future. In that case, if a large volume of a specific resource is sold in a short time frame, its market price will drop. Players should consider these fluctuations when planning their mining and trading strategies.

Buying Robots

Players can purchase additional robots at any time during the game. To do so, they must have sufficient funds in their account. New robots are delivered to a space station. If the player does not specify a station, a random one will be chosen.

Purchases are made through robot vouchers, which define how many robots will be spawned. Vouchers are immediately redeemed upon purchase.

Upgrading Robots

Robots can be upgraded to enhance their capabilities. Upgrades can improve:

  • Carrying capacity
  • Combat strength
    • Attack damage
    • Health points
    • Health point regeneration speed
  • Energy
    • Energy regeneration speed
    • Energy capacity
  • Cooldown time

Upgrades are purchased in the form of vouchers, which are tied to a specific robot. To apply an upgrade, the robot must be located at a space station. If the robot is not at a station when the upgrade is purchased, the voucher will be discarded without refund, and the player will be notified.

Debt

Player cannot go into debt. If they do not have enough money to make a purchase, the transaction will be declined and an error message will be published.

2.1 - Map

What is a Map in MSD?

A map is basically a 2-dimensional grid, structured like below. Each tile can be of a specific type, and can be identified (relative to the overall map) by an integer tuple, starting at (0,0) in the lower left corner.

Map-As-Grid

Robots can move over the map in the directions N, NE, E, SE, S, SW, W, and NW (see below). The map boundaries limit the movements. I.e. from tile (0,2) you can only move in northern, eastern, and southern direction, but not to the west.

Map Directions   Movements-at-Border

Star Systems and Voids

The tiles of a map represent either be a star system or a void. Star systems are connected to each other by hyperlanes. Robots can use these hyperlanes to travel from one star system to the next. Since voids do not have hyperlanes, they are effectively barriers for robot movement. Robots need to navigate around them, traveling from one star system to the next.

Map-Voids

We will create an example map, step by step, to illustrate the map concepts. The image above shows our example map with some barriers on it, on the following tiles:

  • (1,1)…(1,4)
  • (3,6)…(3,7)
  • (7,4)
  • (4,3)
  • (7,0)…(7,1)

Gravity Areas

Traveling hyperlanes requires energy. The robots have a limited supply of energy, which they need to recharge eventually. An MSD map can have areas with different levels of gravity. Depending on that level of gravity, passing a hyperlane from one star system to the next requires a certain amount of energy (reference point is always the target star system).

Map-Gravity

The above image shows our map with a “typical” configuration of increasing gravity towards the center of the map. Gravity comes in three levels:

Gravity Level Energy needed (default)
LIGHT 1
MEDIUM 2
INTENSE 3

Please be aware that the energy needed per level might be configured differently than the default for each individual game. So you cannot rely on these values to be hardcoded.

Space Stations, Resource Mines, and Black Holes

Star systems can have a resource mine or a space station, they can be a black hole, or they are just empty. (For simplicity reasons, a star system - in the current version of this game - can only be one these things. So it cannot be a resource mine and a space station at the same time. Neither will it have more than one resource mine.

Any number of robots - both allied and enemy - can occupy the same star system. The image below shows our example map “ready to play”.

Map-Resources

Space Stations

A space station is a traversable tile that serves as both a trading outpost and a safe zone for robots. Combat is not allowed on space stations. Robots can use space stations to trade, purchase upgrades, and repair themselves. New robots also spawn (are built and launched …) at space stations.

Each player is assigned to a space station, so that new robots for the player fleet will always spawn at the same location. Depending on the number of space stations, several players may share the same station.

Trading and upgrading is only possible at space stations. This means that a robot with its cargo area full of mined resources must travel to a space station in order to sell the resources, and to get upgrades. Space stations are neutral, and accessible to all robots from all players. I.e. robots can travel to any space station, not only the one they spawned from.

Resource Mines

Some star systems contain mines for resource extraction. Depending on the map type, they can randomly distributed across the map, or deliberately located only in certain map parts. Each mine produces a single type of resource, and will continue producing until it is depleted. Each planet can have at most one mine.

There are five types of resources, ranked from most common to rarest (hint: the further back the initial letter is in the alphabet, the more valuable the resource.)

  • Bio Matter
  • Cryo Gas
  • Dark Matter
  • Ion Dust, and
  • Plasma Cores

Our example map above shows a fairly small, but still typical distribution of resources. Bio matter can be found in 8 locations, cryo gas in 5, dark matter in 3, and the most valuable ion dust and plasma cores in 2, respectively. The numbers below the acronyms in the above map describe the available units of the resource.

Once a mine is depleted, it is closed and disappears from the map. Depending on the particular map type definition, new resources may be discovered once the existing units are partially or fully exhausted. This discovery of new resources will not necessarily be at the same location as the old mine.

Mining takes time. A player service can ask for the mining process to start for a dedicated robot. After a short delay, the resource becomes available on the planet ready to be picked up (by that particular robot). This will happen automatically, assuming the robot is capable of doing so. Robots are by default able to transport the least valuable resource in their cargo rooms (bio matter). For all other resources, they need to be upgraded.

All resources are volatile substances. If the robot is not capable of holding the mined resource in its cargo area, they remain at mine and evaporate - i.e. they are lost to both the player who initiated the mining, and to all other players.

Black Holes

Black holes can be traversed by robots, but entering a star system with a black hole will - with a certain probability - lead to the robot’s destruction. The default likelihood for destruction is 50%, but this might be configured differently in a particular map.

Summary

Summing up, the tiles in a 2-dimensional MSD map follow this schema1:

map = { tile }
tile = star system | void
star system = space station | resource mine | black hole | empty 
resource mine = bio matter | cryo gas | dark matter | ion dust | plasma cores

  1. For the afficionados - this is supposed to be a dumbed-down Extended Backus-Naur Form (EBNF) :-) ↩︎

2.2 - (TO BE REVISED) Items

Items - Draft

Buying Items

Players can buy special items from the market. Probably trading has a price list an can issue items itself. Option 1: Trading owns items

  • Trading has a list of items
  • Players can use funds to buy items
    • Items are saved inside of trading?
    • Items are saved inside of player?
    • Items are saved in a new service (Item Service)?

Using Items

Players can use special items. The core problem: Depending on functionality, the item must be implemented in the respective service. e.g:

  • If the item is a weapon / shield, it must be implemented in the robot service.
  • If the item is a mining tool, it must be implemented in the mining(planet) service.

Basic event flow:

  1. Player uses item
  2. Item service sends event “item type x used”
  3. Robot/World service receives event and executes action

2.3 - (TO BE REVISED) Game Rule Discussions

Game Rule Discussions

Trading

  • Robot has to initiate the selling instead of trading (not consistent with other trades)
  • Does selling cost energy?
  • How do we enforce debt consequences? If we do not check money beforehand, what should other services implement as a reaction to “player is in debt”
  • Items are not defined yet

Map

I have a problem with the map being just a bunch of planets, right next to each other with some other fields in between. I would be more comfortable with it being a grid of tiles, where each tile is a star system. This either contains planets & optionally mining stations, is empty, a black hole, or has a space station.

One tile is a star system, it has up to 8 neighbors, connected by hyperlanes.

A star system contains 0..3 planets, space stations, mining stations, and black holes.

graph TD
%%{init: {'flowchart': {'nodeSpacing': 80, 'rankSpacing': 100}}}%%
%% Row 1
A1[Star System A1] --- A2[Star System A2]

%% Row 2
B1[Star System B1] --- B2[Star System B2]

%% Row 3
C1[Star System C1] --- C2[Star System C2]

%% Vertical connections
A1 --- B1
A2 --- B2

B1 --- C1
B2 --- C2

%% Contents of some star systems
A1 --> P_A1[Planet]
A1 --> M_A1[Mining Station]
C1 --> P_C1[Planet]
C1 --> M_C1[Mining Station]
B2 --> SS_B2[Space Station]

classDef star fill:#eef,stroke:#333,stroke-width:2px;
class A1,A2,B1,B2,C1,C2 star;

classDef feature fill:#99f,stroke:#333,stroke-width:1px;
class P_A1,M_A1,P_C1,M_C1,SS_B2 feature;

3 - Getting Started

3.1 - Hello MSD

3.2 - Develop your Player

3.3 - Getting started with operations

Overview of System Architecture

ArgoCD

Combined with Sealed Secrets theoretically all you need General workflow:

  • Write Manifests, or use Kustomize to pull Helm Charts
  • Encrypt all secrets with seal < secret.yaml > secret.yaml
  • Push to Git
  • Profit

Installing Tools

4 - Architecture

For service boundaries & ownership see the dedicated service pages below.

4.1 - Design Principles

Global Design Principles

In MSD, we maintain a couple of global design principles, supposed to help in decision making. Every time we arrive at some priorization decision, these global principles should help deciding the issue. Therefore, this is a living document.

Design-Principles.jpg

Technical Principles

(tbd - just keywords so far)

  • JSON as configuration format (so that it can be reused in REST APIs)

4.2 - Game Service

4.3 - Robot Service

Functionalities

  • Moving
  • Shooting
  • (Mining)
  • Collecting

4.4 - Trading Service

Functionalities

  • Buying Robots
  • Upgrading Robots
  • Selling Resources
  • (Buying Items)

Aggregates / Owns

  • Money
  • (Items)?

4.5 - Map Service

Use Cases

Create new Map Type

Trigger: REST call (POST containing the map type specification, format see TODO)

Responsible Aggregate: MapType

What happens?

  • Store the specifification
  • return the MapType ID

Produced Event(s): MapTypeCreated

Create Map Instance for a new Game, based on a Map Type

Trigger: Event (originating from Game service) that a new game has been created.

Responsible Aggregate: MapType

What happens?

  • TODO

Produced Event(s): MapTypeInstanciated

  • Generating Map
  • Generating Resources
  • Mining / Depleting Resources

Aggregates

See list below.

4.5.1 - MapType (Aggregate)

This page documents the design decisions for the MapType aggregate, which is responsible for specifying a certain type of map, and creating an instance for it.

Configuration Principle

A map type is defined by its size and description. The grid cells are then further configured by a configuration which consists of the following sequential sections.

  1. Gravity distribution zones
    • The gravity distribution is defined via a sequence of map areas, each having a certain gravity level.
    • Those grid cells not covered by this definition have the default gravity value.
  2. Map structure - void / planet definition
    • This section consists of several sequential layers of defining either planets or voids.
      • A void is effectively a barrier, while a planet can carry a resource, a space station, or a black hole.
    • Each new layer overwrites the ones before, allowing for complex distributions
      • e.g. you could define a maze-like structure by first setting the whole map (or a large section of it) to void, and then add “pathes” of planets on top.
      • on the other hand, if you want large open space with one or several barriers in it, you start with a planet-defining layer, and then add the voids as barriers.
  3. Distributions of resources, space stations, or black holes.
    • Distributions consist of an area definition

4.5.2 - Map Configuration

Map configuration is handled by the aggregate MapConfig.

Functionalities

  • Generating Map
  • Generating Resources
  • Mining / Depleting Resources

Aggregates

  • Map (Constellation):
    • Planets
    • Mining Stations
  • Resources incl. their distribution

4.6 - Discussion

Discussion

4.7 - Learnings

Learnings

5 - Player Development

5.1 - Local Dev Environment

5.2 - Player Skeletons

6 - Operations

6.1 - Kubernetes

6.2 - Monitoring

6.3 - AWS Hosting

6.4 - Bare-Metal Hosting

This page serves as a reference for how and why we set up the server the way we did. As a short overview:

We are using the following software to manage our server:

  • RKE2: A Kubernetes distribution that is easy to install and relatively robust.
  • Longhorn: A storage solution for Kubernetes that uses the local hard-drive.
  • Nginx-Ingress: A reverse proxy that is used to route traffic to the correct service.
  • Cert-Manager: A Kubernetes add-on that automates the management and issuance of TLS certificates from letsencrypt.
  • Sealed Secrets: A tool for encrypting Kubernetes Secrets into a format that can be safely stored in a public repository.
  • ArgoCD: A declarative, GitOps continuous delivery tool for Kubernetes.
  • Kustomize: A tool for joining & customizing YAML configurations.
  • Helm: A package manager for Kubernetes that makes installing application pretty simple.

We are using a single rocky linux server running RKE2 (a Kubernetes Distribution). All software that is running on the server is defined in our argocd repository. This is picked up by ArgoCD (running on the server) and applied continuously, so any changes to the repository are automatically applied to the server. This also means, manual changes will be discarded within seconds.

Overview Diagram

An overview of all the components running on our server

An overview of (almost) all the components running on our server, note that everything in the blue lines is running in Kubernetes

Lets dig into the components shown in here. Starting off with the most important:

ArgoCD

The core principle of ArgoCD is pretty simple: if you want to make a change to your cluster, what do you do? Use kubectl apply -f my-manifest.yaml. This is essentially what argoCD does, but slightly more sophisticated and static. Every Manifest you define in an “Application” (we’ll get to that) is applied to the cluster. The good thing: you can reference a Git repository, and ArgoCD will automatically apply any changes to the cluster.

Under the hood, ArgoCD uses Kustomize, which is a powerful tool for combining, patching and generally customizing YAML files on the fly. For example, if you have two yaml files:

  1. ingress.yaml
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: game
    spec:
      ingressClassName: nginx
      rules:
      - host: game.microservice-dungeon.de
        http:
          paths:
          - backend:
              service:
                name: game
                port:
                  number: 8080
            path: /
            pathType: Prefix
    
  2. service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: game
    spec:
      ports:
      - name: http
        port: 8080
        protocol: TCP
        targetPort: 8080
      selector:
        app.kubernetes.io/instance: game
        app.kubernetes.io/name: game
      type: ClusterIP
    

You can use Kustomize to combine them into a single file, which can then be applied to the cluster. In fact, kubectl has built-in support for Kustomize. To know which files to combine how, you have to use a kustomization.yaml file (named exactly like this), which looks like this:

kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
resources:
  - ingress.yaml
  - service.yaml

If you run kubectl kustomize . in the directory where the kustomization.yaml file is located, it will combine the two files into a single file and print it to the console.

This is what ArgoCD does, but it does it for you. You define an “Application” in ArgoCD which is a pointing to a kustomization.yaml file in a Git repository. ArgoCD will then automatically apply any changes to the cluster - just like running kubectl apply -k . in the directory where the kustomization.yaml file is located - but automatically and continuously as the files change.

Now, we are not only using plain yaml files, but also Helm charts. These are a bit more complex (see Helm if you want to learn what they are and why we use them). Simply said, Helm is a package manager, which you can use to install applications on your cluster. Most of the time you customize the installation with “values”, which are basically installation options defined in yaml. Kustomize can be used to render a helm chart but you need to use the --enable-helm flag. For example a kustomization.yaml for cert-manager would look like:

kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

helmCharts:
  - name: cert-manager
    repo: https://charts.jetstack.io
    version: v1.17.0
    releaseName: cert-manager
    namespace: cert-manager
    valuesFile: certmanager-values.yaml

With the cert-manager-values.yaml file looking like this:

crds:
  enabled: true

This will install cert-manager into the cert-manager namespace, using the certmanager-values.yaml file to customize the installation. Under the hood ArgoCD converts the chart into yaml files, you can see this by running kubectl kustomize . --enable-helm in the directory where the kustomization.yaml file is located. Helm has to be installed on your system for that to work.

In the ArgoCD install process we have set the following parameters to enable Helm support:

configs:
   cm:
    create: true
    kustomize.enabled: true
    helm.enabled: true
    kustomize.buildOptions: --enable-helm

Sealed Secrets

In order for us to store secrets like passwords and other sensitive configurations in our git repo, we need to encrypt them. This is where Sealed Secrets comes in. This is a tool that uses asymmetric encryption with a public and private key to encrypt secrets. You can use the kubeseal cli to encrypt an existing secret like this:

kubeseal --cert sealing-key.pem < secret.yaml > sealed.yaml

This assumes that there is a sealing key (a public key/certificate) lying somewhere on the machine you are using kubeseal on. In our ArgoCD repository it is provided in the applications folder. If you dont have the key, but have kubectl access, you can use kubeseal to either fetch the sealing key with kubeseal --fetch-cert > cert.pem or use kubeseal directly:

kubeseal \
-f path/to/unencrypted/secret.yaml \
-w output/path.yaml \
--controller-name=sealed-secrets \
--controller-namespace=sealed-secrets \
--format=yaml 

Once you apply a sealed secret to the cluster, the controller will decrypt it and make it available under the name and namespace of the original secret.

Updating / Merging a secret:

You can merge / update a sealed secret by creating a secret with the same keys as the original secret and using the --merge-intocommand. Suppose you have your sealed secret sealed.yaml:

apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: game-sensitive-config
  namespace: core-services
spec:
  encryptedData:
    DB_NAME: AgCgSNobnCxjTaOpYcDwwfUEqeCAL6loxQDqzWIIgna7B58gbTC3MWUio/...
    DB_PASSWORD: AgBmAh8Yi8Dz+gqVF1GwiFnooEfv8o3xYL3UHEDUhVK2rmSd1f7BHUGVE...
    DB_USER: AgBbVZ99mft7oVuWcHpSV0D+hRRvFousesknAxfVgMdOwRO1BzTYin1SmlRdf...
    RABBITMQ_PASSWORD: AgB5fB3P3O/tLuJyPjg7cu3TQcebJAWJbsqoR4ucy8Z8WFhFJ9L...

and the values you want to change in secret.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: any-name-works
  namespace: core-services
type: Opaque
stringData:
  DB_NAME: "database"
  DB_PASSWORD: "password"

Use the following command to update the sealed secret (optionally with the --cert flag for local use)

kubeseal -o yaml --merge-into sealed.yaml < secret.yaml

Some notes:

  • Sealing keys will automatically rotate every 30 days, so you should re-fetch the sealing key every once in a while. If you keep the private keys backed up somewhere, you also need to re-fetch them, as they will also be rotated. This security feature ensures that if one of your decryption keys is compromised, it will only affect the last 30 days.
  • Do not commit any unsealed secrets to the repository. If you do, change all the passwords of affected services. Dont just delete the secret, or encrypt it afterwards.
  • Sealing is designed to be a one way process. You can unseal a sealed secret if you have the private key but that is not recommended by the developers.

Ingress & TLS

An Ingress is used to route traffic to the correct service. It does so based on host and path, in our case for example: game.microservice-dungeon.de will route to the game service, while robot.microservice-dungeon.de will route to the robot service. Some resources might need encryption, in our setup we can use cert-manager to issue TLS certificates from letsencrypt. You just need to include the annotation cert-manager.io/cluster-issuer: letsencrypt-production and the tls section in your ingress resource:

  tls:
    - hosts:
        - my.domain.com
      secretName: some-secret-name

Afterwards cert-manager will issue a certificate for your ingress.

If you want to read more about how cert-manager works, read the letsencrypt documentation on HTTP01 and DNS01 Solvers. We use the hetzner-webhook to issue DNS01 challenges, so you can also issue wildcard certificates.

Storage

Storage is usually a bit tricky in Kubernetes, since the hard drives are hooked up to specific nodes. Even though we only use a single node, we still use longhorn to manage our storage. It provides a storage class called longhorn-static which you can use to create persistent volumes. We have a total capacity of 1 TB.

Helm

Helm is a package manager for Kubernetes. It works by using templates to configure installations via a values file. Usually a template will look like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name:  {{ include "robot.fullname" . }}
  labels:
    {{- include "robot.labels" . | nindent 4 }}
spec:
  replicas: {{ .Values.replicas | int }} 
  selector:
    matchLabels:
      {{- include "robot.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "robot.selectorLabels" . | nindent 8 }}
    spec:
      serviceAccountName: {{ include "robot.serviceAccountName" . }}
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.registry }}/{{ .Values.image.name }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        {{- with .Values.service.my }}
          - name: {{ .portName | lower }}
            containerPort: {{ .port }}
            protocol: {{ .protocol }}
            {{- end }}
        {{- if .Values.resources }}
        resources:
          {{- toYaml .Values.resources | nindent 10 }}
        {{- end }}
        {{- if .Values.livenessProbe }} 
        envFrom:
          - configMapRef:
              name: {{ include "robot.fullname" . }}
        {{- if .Values.env }}
        env:
        {{- range $key, $value := .Values.env }}
          - name: {{ $key }}
            value: {{ tpl $value $ | quote }}
            {{- end }}
        {{- end }}

They are a bit hard to read, but basically they are normal yaml files with placeholders. Everything prefaced with .values is something coming from the values file. There are some built-in functions like include, toYaml, tpl and more. Also there are

Fernverwaltung HPC Knoten

Unser Rechenknoten im HPC wird mittels IPMI Fernverwaltet. Um IPMI zu verwenden, müssen wir uns über m02, einen anderen Knoten, einloggen. Danach können wir das ipmitool in der Kommandozeile verwenden. Die Ip-Addressen sind wie folgt:

  • goedel-m01: 10.218.112.200
  • goedel-m02: 10.218.112.201

Sammlung an Befehlen zum verwalten:

# Status abfragen
ipmitool -U ADMIN -P ag3.GWDG -H 10.218.112.200 -I lanplus power status
# Einschalten
ipmitool -U ADMIN -P ag3.GWDG -H 10.218.112.200 -I lanplus power on
# Ausschalten
ipmitool -U ADMIN -P ag3.GWDG -H 10.218.112.200 -I lanplus power off
# Sensorstatus abfragen (Zeigt Temperaturen, Lüfterdrehzahlen und andere Sensorwerte an.)
ipmitool -U ADMIN -P ag3.GWDG -H 10.218.112.200 -I lanplus sensor
# Stromverbrauch
ipmitool -U ADMIN -P ag3.GWDG -H 10.218.112.200 -I lanplus dcmi power reading

7 - Reference

7.1 - Game

7.2 - Robot

7.3 - Planet

7.4 - Trading

7.5 - Dashboard

8 - Glossary

Entities

Domain Description
Black hole TODO
Cooldown TODO
Energy TODO
Game TODO
Life points TODO
Map TODO
Mine TODO
Planet TODO
Player TODO
Resource TODO
Robot TODO
Space station TODO
Upgrade TODO
Void TODO

Actions

Domain Description
Mining TODO

9 - Wiki

9.1 - Event Driven Architecture

1. Patterns

1.1 Idempotent Event Consumer

Consumers must be able to process events idempotently. In general, an at least once delivery guarantee is assumed. This means that a consumer may receive the same event multiple times.

There are several reasons for this:

  • Typically, consumers receive and acknowledge events in batches. If processing is interrupted, the same batch will be delivered again. If some events in the batch have already been processed, a non-idempotent consumer will process them again.

  • Some brokers (including Kafka) support transactions, allowing receiving, acknowledging, and sending new events to occur atomically. However, it’s not that simple:

    • First, these transactions are very costly and lead to significantly higher latencies. In larger systems, those costs add up. These costs arise because processing no longer happens in batches as each event is transmitted individually, along with additional calls similar to a two-phase commit.

    • Second, this approach only works as long as a single technology is used. Otherwise, a transaction manager is required - the difference between local and global transactions. Else, events may be lost or published multiple times — bringing us back to the original problem. (See atomic event publication)

Idempotent Business Logic

In most (or even all?) cases, business logic can be made idempotent. It’s important to distinguish between absolute and relative state changes.

  • Absolute state changes:
    With an absolute state change, a value is typically replaced by a new one — such as an address. These operations are always idempotent.

  • Relative state changes:
    A relative state change occurs when, for example, an amount is deducted from an account balance. Reprocessing the event would alter the balance again. Alternatively, an account balance can be modeled as a sequence of transactions. If the event ID is unique and part of the SQL schema, reprocessing would result in an error — this is essentially event sourcing.

Remembering Processed Events

Alternatively, the IDs of processed events can be stored in the database. Before processing, a lookup can be performed to check whether the event has already been handled.

For consistency, it’s crucial that storing the event ID happens within the same transaction as all other domain changes.

A prerequisite for a simple solution without locking is that the same event cannot be processed in parallel by different consumers. With brokers like Kafka, this is guaranteed — there can only be one consumer per partition.

Transactional Event Processing

Another alternative is the use of local (or global) transactions, accepting the associated drawbacks — assuming the broker supports transactions at all.

It’s important to note that when using multiple technologies, a transaction manager is required. Local transactions only work within a single technology — in this case the broker. That’s why exactly once processing works in Kafka Streams, where Kafka supports the role of a database as well.

1.2 Atomic Event Publication

Every change in a domain leads to the publication of an event in an event-driven architecture.

For the overall system to remain consistent, it would be ideal if the change and the publication occurred atomically within the same transaction. This would provide the highest possible level of consistency.

However, this is not easily achievable as database and broker are typically separate systems. To enable a global transaction across both, a transaction manager would be needed.

💡 A local transaction refers to a transaction within a single system (or technology), such as a database. When multiple systems are involved, separate local transactions are created independently. What’s missing is a mechanism to coordinate them. That’s where the transaction manager comes in — a separate system that synchronizes local transactions into a global transaction. This is the only way to achieve exactly once semantics across systems. All other approaches relax this guarantee.

The problem is very similar to ACID and CAP considerations in distributed systems, where trade-offs are made between different properties—or guarantees.

What are these guarantees?

  • Ordering Guarantees
    A guarantee regarding the order of processing—whether it must match the original processing sequence or not.

  • Delivery / Publication Guarantees
    A guarantee regarding the delivery of events—whether events may be delivered multiple times, exactly once, or at most once.

  • Read Your Writes
    A guarantee about immediate consistency — i.e., whether a system can read its own changes right after writing them.

It’s important to note that some guarantees exist along a spectrum. The theory is much more detailed, so this section serves merely as an introduction. There are three approaches to solving this problem. Each fulfills different guarantees and comes with its own trade-offs.

Transaction Manager

A transaction manager coordinates local transactions across independent systems — essentially acting as the orchestrator of a two-phase commit (2PC).

exactly once in-order - A transaction manager provides by far the highest level of consistency between two systems. For this reason, it is often used in banking systems.

Disadvantages

  • Performance
    A transaction manager introduces significant overhead, as it is a separate service that orchestrates a 2PC between systems. In Kafka, for example, using local transactions would also mean that events could no longer be processed in batches.

  • Cost
    Transaction managers are often commercial products — there are few, if any, free alternatives.

  • Support
    The involved systems must implement certain standards to integrate with a transaction manager.

Transactional Outbox Pattern

With a transactional outbox, the problem of independent transactions is deferred by storing events in the database as part of the same local transaction. A separate process reads and publishes these events in order.

at least once in-order – The storage of events happens exactly once, while their publication is at least once. The problem of independent transactions is effectively shifted by this pattern. As long as reading and publishing do not happen concurrently, the event order is preserved.

Types of Transactional Outboxes

  • Polling Outbox
    The database is regularly polled for new events. This approach has higher latency and significantly increased resource consumption.

  • Tailing or Subscriber Outbox
    This type subscribes to a change stream of the database. Databases like Redis and MongoDB offer built-in subscriber mechanisms. For others, the write-ahead log can be used to achieve the same effect.

Further Considerations and Challenges

  • Scaling
    As throughput increases, scaling and latency can become issues. Depending on the context, sharding may help—but it’s important that the outbox sharding aligns with the broker’s sharding. Otherwise, event ordering may break.

  • Single Point of Failure
    A single outbox instance represents a single point of failure. Therefore, a cluster with leader election is needed to ensure high availability and low latency.

Event Persistence

Similar to the outbox pattern, events are stored in the database along with an additional status field. However, unlike the outbox, events are published by the same process followed by a status update.

An additional process periodically scans the database for unpublished events and publishes them as a fallback.

at least once out of order – Due to the nature of the publishing mechanism, maintaining the correct order is a best effort.

💡 This is by the way the mechanism Spring Modulith uses for its internal eventing. Something to keep in mind.

Separate Transaction

Another option is to more or less ignore the problem.

at most once in-order – Both local transactions would — if possible — be nested, so that a failure in the inner transaction leads to a rollback of the outer one. However, if the outer transaction fails after the inner one has been executed, this will inevitably lead to inconsistency.

💡 This is exactly what happens when using @Transactional in Spring-Kafka and Spring-Data without an external transaction manager.

9.2 - Databases

1. Concepts

1.1 Isolation Levels

During parallel execution of transactions, a variety of race conditions can occur. Transaction isolation levels are methods used to prevent these. Each level is characterized by the type of race condition it prevents. Moreover, each level also prevents all the race conditions addressed by the previous levels. It is important to note that each database engine implements these levels differently.

Dirty Read Non Repeatable Read Write Skew (Phantom)
Read Uncommitted X X X
Read Committed X X
Repeatable Read X X
Serializable

Read Committed

The lowest of the isolation levels that provides the following guarantees:

  1. All reads operate on data that has been fully committed.
    No dirty reads.
  2. All writes operate on data that has been fully committed.
    No dirty writes.

Without these guarantees, the level is referred to as Read Uncommitted.

Dirty Reads

In a dirty read, a transaction reads the uncommitted changes of ongoing transactions. This allows the reading of intermediate states. If any of these transactions then fail, data that should never have existed would have been read.

Dirty Writes

In a dirty write, an update is made to the uncommitted changes of a concurrently running transaction. This poses a problem because, depending on the timing, only parts of each transaction might be applied.

Example:

  • Buying a car requires updates to two tables — the listing and the invoice. Two interested parties, A and B, try to buy the same car at the exact same time.
    • Transaction A updates the listing, but then briefly pauses (this may happen due to a CPU context switch).
    • Transaction B overwrites A’s update on the listing.
    • Transaction B updates the invoice.
    • Transaction A resumes and overwrites the invoice.
  • Buyer B has purchased the car (last write operation), but buyer A receives the invoice.

Implementation

To implement Read Committed, PostgreSQL, for example, uses Multi-Version Concurrency Control (MVCC) based on timestamps. Transactions only read data that was written before their start.

Repeatable Read

The isolation level above Read Committed, which additionally prevents nonrepeatable reads. Also referred to as Snapshot Isolation because each transaction operates on its own snapshot of the database.

Nonrepeatable Reads (or Read Skew)

A nonrepeatable read (also known as read skew) occurs when an aggregate function (such as SUM) is applied to a range of rows that change during its computation — particularly entries that have already been read. This affects inserts, updates, and deletes equally. If re-executing the function yields a different result, it is considered a nonrepeatable read.

Example:

ID Salary
1 1000
2 2000
3 3000
4 2500
5 1000

A session executes a SUM over the salaries. Meanwhile, the employee with ID 3 is deleted — at the moment the computation reaches ID 4. As a result, an incorrect total salary is calculated.

Nonrepeatable reads are especially dangerous for long-running processes that rely on data integrity, such as backups, analytic workloads, and integrity checks.

Serializable

The highest isolation level, where transactions are executed sequentially — at least according to the standard. In reality, databases deviate from this. For example, PostgreSQL uses monitoring to detect conflicts between concurrently running sessions. In the event of a conflict, one of the two transactions is aborted.

Sequential execution prevents lost updates, write skews, and phantoms. However, these issues can also be avoided through proper locking.

Lost Updates

A lost update is a read-modify race condition on a shared row. Here, one session reads a value as input for a calculation and then updates it. Meanwhile, a parallel session updates the same value between the read and write operations. This intermediate update is lost.

Example:

ID Value
1 3
  • Session A reads the value 3 and intends to increase it by 2.
  • Between read and the write, a parallel session writes the value 4.
  • When Session A writes the value 5, the update to 4 is lost.

Solutions include serialized execution, locking, and atomic operations, although the use of atomic operations is limited to specific cases.

Write Skew (Phantoms)

A write skew is a race condition involving reads on shared entries and writes on separate entries. This applies equally to inserts, updates, and deletes.

Example (1) — materialized:

  • A hospital’s shift plan requires that at least one doctor is on call at all times. For one evening, two doctors (A) and (B) are scheduled. Both wish to sign off.
  • (A) and (B) attempt to sign off at the same time. The system checks in parallel whether at least one doctor remains on call — which is true in both cases. Both are removed concurrently.
  • As a result, no doctor remains on call.

Example (2) — unmaterialized:

  • Booking a meeting room is handled through time slots; stored as entries with a start and end time, assigned to a room and a responsible person.
  • Two people (A) and (B) try to book the room at the same time. The system checks if a booking exists for the requested time slot — in both cases, none is found. Two new entries are created simultaneously.

The difference between Example (1) and Example (2) is that in (1) the conflict is materialized, while in (2) it is not. This means in (1) there are existing entries that could be locked; in (2) there are no entries yet — and you cannot lock what doesn’t exist.

Solutions include serialized execution and locks. However, locks require a materialized conflict — or they must be applied to the entire table.

Postgres Specifics

Repeatable Read

In PostgreSQL, Repeatable Read is implemented as snapshot isolation using MVCC. Each transaction sees a snapshot of the database taken at the moment it starts. The following points are important:

  • Locking
    Locks are applied independently of versions. Only entries visible in the transaction’s snapshot are considered — newer entries are ignored.
  • Exception Handling
    When executing updates, deletes, or applying locks, an exception is thrown if a newer version of the affected entries exists outside the snapshot. Read-only queries without locking are not affected.
    On the application side, proper error handling must involve retrying the entire transaction in order to work with a newer snapshot.

Serializable

This isolation level extends the snapshot isolation of Repeatable Read by adding conflict monitoring through predicate locks. True serialization is not achieved; instead, an optimistic approach is used, resolving conflicts by aborting transactions when they are detected.

Access to entries is recorded as predicate locks — visible in pg_locks. If a conflict arises, one of the involved transactions is aborted with an error message. Predicate locks do not play a role in deadlocks!

Important points when using Serializable:

  • Exception Handling and Consistency
    Applications must implement error handling for serialization exceptions by retrying the entire transaction.
  • Read Consistency
    Reads are only considered consistent after a successful commit because a transaction may still be aborted at any time.
    This does not apply to read-only transactions. Their reads are consistent from the moment a snapshot is established; these transactions must be flagged as read-only.
  • Locking
    Explicit locking becomes unnecessary when Serializable is used globally. In fact, for performance reasons, explicit locks should be avoided!
  • Mixing Isolation Levels
    Conflict monitoring only applies to transactions running under Serializable. Therefore, it is recommended to set the isolation level globally.

Important:
Keep in mind that sequential scans and long-running transactions can lead to a high number of aborts as the number of concurrent users increases. Serializable works best with small, fast transactions.

10 - Contribute