Skip to content

Raft Implementation

Following on from the Raft page, let's look at how its implementation in Aeron Cluster compares with the theory.

First, the similarities:

  • The overall organisation is obviously the same - each cluster member has a Log, Consensus Module and State Machine. The State Machine is called a Clustered Service, which is a wrapper around your service code that lets it interact with the Consensus Module and the Log. The Log is held in memory and also written to disk in an Aeron Archive recording.
  • A deterministic state machine is required - it is up to the Clustered Service to behave and only take input from the Log. If the Clustered Service has any interaction with the outside world through any other means, even reading the current time, this compromises determinism, which means there's little point in using Aeron Cluster.
  • Members also have the roles of leader, follower and candidate. They hold elections, and each election starts a new term, which has a term number. Aeron Cluster calls them leadership terms, to distinguish them from other 'terms' in Aeron.

Let's revisit the Overview diagram, now that we've covered Raft.

All the components communicate with each other via Aeron Transport. The sender writes messages into a log buffer (the grey boxes), which the recipient reads directly from if it's on the same machine. If the recipient is on a different machine, Aeron Transport replicates it to a replica of the log buffer (an Image), and the recipient reads it from the replica.

Clustered Service (your business logic and state) Aeron Cluster (Consensus Module) Aeron Archive Aeron Transport log log recording Follower 2 Clustered Service (your business logic and state) Aeron Cluster (Consensus Module) Aeron Archive Aeron Transport log log recording Follower 1 Aeron Transport Clustered Service (your business logic and state) Aeron Cluster (Consensus Module) Aeron Archive log recording egressB egressA log ingressB ingressA Leader Network Aeron Transport Client B Aeron Transport Client A CommitPosition: 4 AppendPosition: 4 CommitPosition: 4 AppendPosition: 4

Use the tabs above to step through the animation.

Messages from clients arrive on an Ingress channel on the leader. Clients know about all cluster members and can connect to any of them. All members listen on an Ingress channel, but if a client connects to a follower, the follower replies on the client's Egress channel, redirecting them to the leader (as per Raft).

The Consensus Module takes client messages and puts them into a single sequence in the Log (as per Raft). The Log is an Aeron Transport log buffer - just a piece of shared memory that different processes on the same machine can all access as though it was in their own process.

After the last election, the followers created an Aeron Transport subscription to the leader's Log. This means whenever the leader adds messages to its Log, Aeron Transport automatically replicates them to the followers. This is different from Raft, where the leader makes an explicit AppendEntries RPC call to the followers.

After the last election, each Consensus Module also told its Aeron Archive to copy new messages from its Log to the Log Recording (a file on disk). Raft also requires new messages to be written to disk.

Whenever the Archives write new messages to the Log Recording, they update a counter in shared memory containing the new position they have written up to. Whenever this changes, the followers' Consensus Modules send an AppendPosition message back to the leader, with the new position. This is equivalent to returning a result to a Raft AppendEntries RPC.

As per Raft, the Consensus Module calculates the position that the majority of the members have on disk - the commit positon, which is equivalent to Raft's latest committed index. This is sent to the followers in a CommitPosition message. It's sent whenever it changes (or on a heartbeat), unlike Raft, which sends it on the back of the next AppendEntries RPC.

Each member has a CommitPosition counter (just a number, held in shared memory). When the followers receive a CommitPosition message, they update the CommitPosition counter. The Clustered Services watch the counter and process messages up to the new counter whenever it changes. Superficially, this is the same as Raft, just using different mechanics.

Only the leader responds to clients. Again, same as Raft.


That's it - that's a high level overview of Aeron Cluster. I said you could learn it by stealth by learning the theory!

Position

It's worth pointing out a difference in how Raft and Aeron Cluster refer to messages. Raft refers to messages based on their index in the Log, i.e. message 0, 1, 2, etc. Aeron Cluster uses 'position', which is a 64-bit byte index into the Log. It starts at zero and increases by the length of each message, plus some overhead for a message header and some padding at the end.

Raft has a commitIndex to refer to the last committed entry. Aeron Cluster has a CommitPosition Counter, which is a byte index that points to the end of the last committed message (including padding).

'Position' in Aeron Cluster never resets (unless you delete all the data). If the Log Recording contains several Segment files and the oldest Segment file is detached and deleted, the position of the latest entries continues to increase regardless. The offset of a message in a buffer can be found by using 'position modulo buffer size'.