Cluster Files

Aeron Cluster uses Aeron Transport and Aeron Archive, causing them to create a number of files. On top of that, it creates a few files of its own. Here is an example of the important files on a typical Cluster Member:

Aeron Transport¶

All the grey boxes in the diagram are files created by Aeron Transport. They are shared memory files that are transient, in that they are deleted each time the Media Driver starts.

the cnc.dat file contains all the Counters created by Aeron Transport, Archive and Cluster
the loss-report.dat records any message loss
the remaining files are log buffers, used for different streams of messages. Low level details of the structure of the log buffer files was described in Log Buffers. High level details of how Aeron Cluster uses them for different message streams was described in the Detailed Overview

Aeron Archive¶

Archive files are regular files on disk that persist across cluster restarts.

The most important Archive file created for Aeron Cluster is the Log Recording, which is the Raft Log - the single sequence of messages to be processed by the Cluster. Large Recordings end up being split up across multiple Segment files, so a Recording may appear as multiple files on disk. The Log grows forever, so having the Recording split across multiple Segment files provides a way of archiving and removing the oldest parts.

If the user chooses to snapshot the Cluster, these are also stored in Recordings. There may be more than one Clustered Service. When a snapshot happens, the Consensus Module and the Clustered Service(s) each write their snapshot to their own Recording, so each snapshot consists of a set of Recordings at the same Log position.

The Archive also has an archive.catalog file and an archive-mark.dat file, which are described in Archive Files.

Consensus Module¶

The location of the Consensus Module files is configured by the clusterDir configuration in the Consensus Module Context. It is described in the Cluster Tutorial.

In that directory, it creates three files:

cluster-mark.dat - this is the mark file, which stops a second instance of the Consensus Module from starting in the same clusterDir. It also stores metadata about the cluster and a distinct error log
recording.log - this is the cluster's index into the Recordings it has created in the Archive
node-state.dat - this stores node-specific data, which can't be stored in a snapshot as they are identical across all nodes

cluster-mark.dat¶

This is a mark file for the Consensus Module, which contains two sections.

The first section contains information about the cluster, including:

the 'mark' activity timestamp
the control channel, consensus module stream id and service stream id - these are for the IPC channels between the Consensus Module and Clustered Service, and are read by ClusterTool so it can also connect to the Consensus Module

The full contents of the first section are defined by the MarkFileHeader in the aeron-cluster-mark-codecs.xml.

The second section is a buffer used for an error log. It is wrapped with a DistinctErrorLog, like the one in Aeron Transport's cnc.dat file. Any errors that occur within Aeron Cluster are recorded there.

The mark file is used in the same way as Aeron Archive's mark file in that it is used to stop a second instance of Aeron Cluster from starting, using the same cluster directory. Multiple instances of Aeron Cluster can be run on the same machine by using different cluster directories.

recording.log¶

The Consensus Module stores the Log and any snapshots as Recordings in the Archive. The Archive has a catalog for the Recordings, but doesn't know what's in each of them. This is what the recording.log file is for. The Consensus Module uses it to track which is the Log Recording, and where each of the leadership terms start and end within it. It also contains a list of all the snapshot Recordings, which it uses this to find the latest snapshot.

For each leadership term and snapshot, the Consensus Module adds an Entry to recording.log, containing several fields like the recordingId, leadershipTermId, termBaseLogPosition (start position) and current logPosition. The format of the file is described in the RecordingLog class, which is used to read / update the file.

Example¶

Consider a scenario that starts a 3-node cluster with 2 clients, takes a snapshot, kills the leader, waits for a new leader to be elected, restarts the old leader, then takes another snapshot.

At the end, running ClusterTool recording-log on one of the cluster members prints the Recording Log's contents (each cluster member should have the same Recordings in this scenario). Here is the output, reformatted slightly for readability:

Entry{recordingId=0, leadershipTermId=0, termBaseLogPosition=0, logPosition=1504, timestamp=1745823206184, serviceId=-1, type=TERM, isValid=true, entryIndex=0}
Entry{recordingId=1, leadershipTermId=0, termBaseLogPosition=0, logPosition=1408, timestamp=1745823209763, serviceId=0, type=SNAPSHOT, isValid=true, entryIndex=1}
Entry{recordingId=2, leadershipTermId=0, termBaseLogPosition=0, logPosition=1408, timestamp=1745823209763, serviceId=-1, type=SNAPSHOT, isValid=true, entryIndex=2}
Entry{recordingId=0, leadershipTermId=1, termBaseLogPosition=1504, logPosition=-1, timestamp=1745823221315, serviceId=-1, type=TERM, isValid=true, entryIndex=3}
Entry{recordingId=3, leadershipTermId=1, termBaseLogPosition=1504, logPosition=4960, timestamp=1745823230353, serviceId=0, type=SNAPSHOT, isValid=true, entryIndex=4}
Entry{recordingId=4, leadershipTermId=1, termBaseLogPosition=1504, logPosition=4960, timestamp=1745823230353, serviceId=-1, type=SNAPSHOT, isValid=true, entryIndex=5}

In a table, it looks like this:

recordingId	leadershipTermId	termBaseLogPosition	logPosition	timestamp	serviceId	type	isValid	entryIndex
0	0	0	1504	1745823206184	-1	TERM	true	0
1	0	0	1408	1745823209763	0	SNAPSHOT	true	1
2	0	0	1408	1745823209763	-1	SNAPSHOT	true	2
0	1	1504	-1	1745823221315	-1	TERM	true	3
3	1	1504	4960	1745823230353	0	SNAPSHOT	true	4
4	1	1504	4960	1745823230353	-1	SNAPSHOT	true	5

The first row is for leadership term 0 in the Log Recording, which has recordingId 0 in the Archive. It starts at log position 0 and ends at 1504. It's not for a specific service, so serviceId is the null value -1.
The second row was added when the first snapshot was taken. It is for the snapshot of the Clustered Service, and was taken when the log position was 1408, within leadership term 0. The service has id 0. If there was more than one Clustered Service, there would be separate Recording and a row for each of them.
The third row is for the snapshot of the Consensus Module, which is taken at the same time (serviceId is null). It is stored in its own Recording, with recordingId 2.
The fourth row was added when the leader was killed and the remaining followers elected a new leader in leadership term 1. Note that it has the same recordingId as the first row - all the leadership terms for the Log are stored in the same Recording. The termBaseLogPosition (start position) is 1504. The logPosition field is left unset while the leadership term is active. When the next leadership term starts, the logPosition of the previous one is filled in, so logPosition in the first row was set to 1504 when row 4 was added.
The fifth and sixth rows are for the second snapshot, which was taken at log position 4960.

node-state.dat¶

As per the javadoc in NodeStateFile, this is

An extensible list of information relating to a specific cluster node. Used to track persistent state that is node specific and shouldn't be present in the snapshot, e.g. candidateTermId.

The javadoc also describes the contents of the file and its high-level structure. The details of what is stored can be found in aeron-cluster-node-state-codecs.xml.

Clustered Service¶

The location of the Clustered Service files is specified by the clusterDir configuration on the Clustered Service Context. It is described in the Cluster Tutorial here. In the tutorial, both the Consensus Module and Clustered Service Contexts are configured with the same cluster directory, but they can be different.

cluster-mark-service-0.dat¶

This is the mark file for the first Clustered Service, where the '0' is the service id. If there was more than one Clustered Service, there would be a mark file for each (service ids 0, 1, 2, etc.).

This mark file has the same structure as the cluster mark file.