Consensus Messages¶

Most Consensus messages are sent between the Consensus Modules. They are primarily to achieve consensus on the Log, or to communicate during an Election. Each cluster member keeps a Consensus Publication to each of the others. There are some other messages sent using the Consensus streamId that are from external tooling / frameworks - these are highlighted in the list below.

The Consensus messages are described below.

CanvassPosition
RequestVote
Vote
NewLeadershipTerm
AppendPosition
CommitPosition
CatchupPosition
StopCatchup
TerminationPosition
TerminationAck
BackupQuery (from ClusterBackupAgent)
BackupResponse (to ClusterBackupAgent)
Challenge (to ClusterBackupAgent)
ChallengeResponse (from ClusterBackupAgent)
HeartbeatRequest (from Cluster Standby)
HeartbeatResponse (to Cluster Standby)
StandbySnapshot (part of Cluster Standby)

CanvassPosition¶

Sent by a Consensus Module's Election object when it's in the CANVASS or NOMINATE state. Used to publish its Log Recording append position to the other cluster members, so they can all see which member has the most recent state, prior to electing a new leader. The message is sent to all the other cluster members.

CanvassPosition {
    logLeadershipTermId  long   // leadership term of the last message in the Log Recording
    logPosition          long   // position of the last message in the Log Recording
    leadershipTermId     long   // current leadership term of the member
    followerMemberId     int    // this member's id, so the recipients know the sender
    protocolVersion      int    // consensus module version
}

It is received in the Consensus Module by the ConsensusAdapter. If it's in an Election, it is given to Election.onCanvassPosition(), otherwise, the ConsensusModuleAgent returns a NewLeadershipTerm to the sender, which tells it to join the current leadership term.

Example: first CanvassPosition after starting up a new cluster member with no state:

CanvassPosition {
    logLeadershipTermId: -1     // nothing in the Log Recording
    logPosition        : 0      // never received any Log messages
    leadershipTermId   : -1     // current term (not in one, as not yet been in an Election)
    followerMemberId   : 1      // member id
    protocolVersion    : 65536  // hardcoded version
}

See onCanvassPosition to see how it's handled during an Election.

RequestVote¶

Sent by a Consensus Module's Election object when it's in the CANDIDATE_BALLOT state, to request the other members vote for it. This is when the cluster member has received CanvassPosition messages from the other members and thinks it has the latest or join-latest in its Log Recording, so is a candidate to become leader.

RequestVote {
    logLeadershipTermId  long   // leadership term of the last message in the Log Recording
    logPosition          long   // position of the last message in the Log Recording
    candidateTermId      long   // proposed next leadership term
    candidateMemberId    int    // this member's id, so the recipients know the sender
    protocolVersion      int    // consensus module version
}

It is received in the Consensus Module by the ConsensusAdapter. If it's in an Election, it is given to Election.onRequestVote(), otherwise, if the candidateTermId is higher than its current leadership term, it enters an Election itself.

Example: first RequestVote after starting up a new cluster member with no state:

RequestVote {
    logLeadershipTermId: -1     // nothing in the Log Recording
    logPosition        : 0      // never received any Log messages
    candidateTermId    : 0      // proposed next leadership term
    candidateMemberId  : 1      // sender's member id
    protocolVersion    : 65536  // hardcoded version
}

See onRequestVote to see how it's handled during an Election.

Vote¶

Sent by a Consensus Module's Election object in response to a RequestVote.

Vote {
    candidateTermId      long     // from RequestVote
    logLeadershipTermId  long     // sender's leadership term of the last message in the Log Recording
    logPosition          long     // sender's position of the last message in the Log Recording
    candidateMemberId    int      // from RequestVote
    followerMemberId     int      // this member's id
    vote                 boolean  // voting intention
}

It is received in the Consensus Module by the ConsensusAdapter. If it's in an Election, it is given to Election.onVote(), which records the vote for the cluster member, otherwise it's ignored.

See onVote to see how it's handled during an Election.

NewLeadershipTerm¶

This is sent by the leader Consensus Module in three circumstances:

when it has won the Election, to notify the other members. It describes the leadership term that it is leading. The same message is sent to all followers every 200 ms, while in LEADER_LOG_REPLICATION, LEADER_REPLAY, and LEADER_READY
it can be sent as a reply, specifically to one member, if the member starts and sends a CanvassPosition message while the other members are already running. The leader doesn't enter into an Election itself. It echoes the logLeadershipTermId field from the CanvassPosition message and returns information about the next leadership term after that one, and about the current leadership term. The member uses this information to start the process of back-filling missing Log entries
it can be sent as a reply, specifically to one member, if the member sends a RequestVote message. To be honest, I'm not sure how this situation can arise! todo: investigate

These are the fields and how they are used when sent in response to a CanvassPosition (see Replication and Catchup Details for a worked example):

NewLeadershipTerm {
    logLeadershipTermId      long    // echoed from CanvassPosition - the next 3 fields are relative to this term
    nextLeadershipTermId     long    // next term's id
    nextTermBaseLogPosition  long    // next term's start position
    nextLogPosition          long    // next term's log position (end position)
    --- above here is in response to a CANVASS
    leadershipTermId         long    // the current term's id
    termBaseLogPosition      long    // the current term's start position
    logPosition              long    // the current log position (log publisher position)
    leaderRecordingId        long    // the current log recordingId on the leader
    timestamp                long    // current time on the leader (epoch time in cluster time units)
    leaderMemberId           int     // leader's id
    logSessionId             int     // sessionId of the log Publication
    appVersion               int     // consensus module version
    isStartup                boolean // FALSE when sent in reply to a CanvassPosition
}

When not sent in response to a CanvassPosition, the first 4 fields refer to the leadership term that is about to start.

As a concrete example, these are all the NewLeadershipTerm messages sent in the Follower offline deep dive:

1) is where the 3 node cluster started for the first time and node 1 became leader in leadership term 0
8) is after node 0 was killed and nodes 1 and 2 restarted without it - node 1 became leader again in leadership term 1
13) is after nodes 1 and 2 restarted without node 0 - node 2 became leader in leadership term 2
18) is after nodes 1 and 2 restarted without node 0 - node 1 became leader in leadership term 3
22) is after node 0 started and went through the replication and catchup process, progressing its way through the leadership terms

1)  in LEADER_LOG_REPLICATION, to nodes 0 and 2:    payload: logLeadershipTermId=-1|nextLeadershipTermId=0|nextTermBaseLogPosition=0   |nextLogPosition=-1  |leadershipTermId=0|termBaseLogPosition=0   |logPosition=0   |leaderRecordingId=-1|timestamp=1737306778533|leaderMemberId=1|logSessionId=464720373  |appVersion=1|isStartup=FALSE
                                                    payload: logLeadershipTermId=-1|nextLeadershipTermId=0|nextTermBaseLogPosition=0   |nextLogPosition=-1  |leadershipTermId=0|termBaseLogPosition=0   |logPosition=0   |leaderRecordingId=-1|timestamp=1737306778533|leaderMemberId=1|logSessionId=464720373  |appVersion=1|isStartup=FALSE
    in LEADER_REPLAY, to nodes 0 and 2:             payload: logLeadershipTermId=-1|nextLeadershipTermId=0|nextTermBaseLogPosition=0   |nextLogPosition=-1  |leadershipTermId=0|termBaseLogPosition=0   |logPosition=0   |leaderRecordingId=-1|timestamp=1737306778533|leaderMemberId=1|logSessionId=464720373  |appVersion=1|isStartup=TRUE
                                                    payload: logLeadershipTermId=-1|nextLeadershipTermId=0|nextTermBaseLogPosition=0   |nextLogPosition=-1  |leadershipTermId=0|termBaseLogPosition=0   |logPosition=0   |leaderRecordingId=-1|timestamp=1737306778533|leaderMemberId=1|logSessionId=464720373  |appVersion=1|isStartup=TRUE
    in LEADER_READY, to nodes 0 and 2:              payload: logLeadershipTermId=0 |nextLeadershipTermId=0|nextTermBaseLogPosition=0   |nextLogPosition=-1  |leadershipTermId=0|termBaseLogPosition=0   |logPosition=0   |leaderRecordingId=0 |timestamp=1737306778547|leaderMemberId=1|logSessionId=464720373  |appVersion=1|isStartup=TRUE
                                                    payload: logLeadershipTermId=0 |nextLeadershipTermId=0|nextTermBaseLogPosition=0   |nextLogPosition=-1  |leadershipTermId=0|termBaseLogPosition=0   |logPosition=0   |leaderRecordingId=0 |timestamp=1737306778547|leaderMemberId=1|logSessionId=464720373  |appVersion=1|isStartup=TRUE
8)  in LEADER_LOG_REPLICATION, to node 2:           payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=-1  |leadershipTermId=1|termBaseLogPosition=2848|logPosition=2848|leaderRecordingId=0 |timestamp=1737306850311|leaderMemberId=1|logSessionId=1884081998 |appVersion=1|isStartup=FALSE
    in LEADER_REPLAY, both to node 2:               payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=-1  |leadershipTermId=1|termBaseLogPosition=2848|logPosition=2848|leaderRecordingId=0 |timestamp=1737306850312|leaderMemberId=1|logSessionId=1884081998 |appVersion=1|isStartup=TRUE
                                                    payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=-1  |leadershipTermId=1|termBaseLogPosition=2848|logPosition=2848|leaderRecordingId=0 |timestamp=1737306850335|leaderMemberId=1|logSessionId=1884081998 |appVersion=1|isStartup=TRUE
    in LEADER_READY, to node 2:                     payload: logLeadershipTermId=1 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=-1  |leadershipTermId=1|termBaseLogPosition=2848|logPosition=2848|leaderRecordingId=0 |timestamp=1737306850656|leaderMemberId=1|logSessionId=1884081998 |appVersion=1|isStartup=TRUE
13) in LEADER_LOG_REPLICATION, to node 1:           payload: logLeadershipTermId=1 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=-1  |leadershipTermId=2|termBaseLogPosition=4064|logPosition=4064|leaderRecordingId=0 |timestamp=1737306917688|leaderMemberId=2|logSessionId=-2066487224|appVersion=1|isStartup=FALSE
    in LEADER_REPLAY, both to node 1:               payload: logLeadershipTermId=1 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=-1  |leadershipTermId=2|termBaseLogPosition=4064|logPosition=4064|leaderRecordingId=0 |timestamp=1737306917688|leaderMemberId=2|logSessionId=-2066487224|appVersion=1|isStartup=TRUE
                                                    payload: logLeadershipTermId=1 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=-1  |leadershipTermId=2|termBaseLogPosition=4064|logPosition=4064|leaderRecordingId=0 |timestamp=1737306917698|leaderMemberId=2|logSessionId=-2066487224|appVersion=1|isStartup=TRUE
    in LEADER_READY, to node 1:                     payload: logLeadershipTermId=2 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=-1  |leadershipTermId=2|termBaseLogPosition=4064|logPosition=4064|leaderRecordingId=0 |timestamp=1737306918111|leaderMemberId=2|logSessionId=-2066487224|appVersion=1|isStartup=TRUE
18) in LEADER_LOG_REPLICATION, to node 2:           payload: logLeadershipTermId=2 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1  |leadershipTermId=3|termBaseLogPosition=5280|logPosition=5280|leaderRecordingId=0 |timestamp=1737306985374|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
    in LEADER_REPLAY, both to node 2:               payload: logLeadershipTermId=2 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1  |leadershipTermId=3|termBaseLogPosition=5280|logPosition=5280|leaderRecordingId=0 |timestamp=1737306985374|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=TRUE
                                                    payload: logLeadershipTermId=2 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1  |leadershipTermId=3|termBaseLogPosition=5280|logPosition=5280|leaderRecordingId=0 |timestamp=1737306985388|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=TRUE
    in LEADER_READY, to node 2:                     payload: logLeadershipTermId=3 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1  |leadershipTermId=3|termBaseLogPosition=5280|logPosition=5280|leaderRecordingId=0 |timestamp=1737306985861|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=TRUE
22) onCanvassPosition, to node 0 (all 4):           payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=4064|leadershipTermId=3|termBaseLogPosition=5280|logPosition=6496|leaderRecordingId=0 |timestamp=1737306989652|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
                                                    payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=4064|leadershipTermId=3|termBaseLogPosition=5280|logPosition=6496|leaderRecordingId=0 |timestamp=1737306989744|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
                                                    payload: logLeadershipTermId=1 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=5280|leadershipTermId=3|termBaseLogPosition=5280|logPosition=6496|leaderRecordingId=0 |timestamp=1737306989946|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
                                                    payload: logLeadershipTermId=2 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1  |leadershipTermId=3|termBaseLogPosition=5280|logPosition=6592|leaderRecordingId=0 |timestamp=1737306989974|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE

It is received in the Consensus Module by the ConsensusAdapter and passed to the ConsensusModuleAgent. If the Consensus Module is in an Election, at this point, it would accept the new leader and move to the appropriate Election state as a follower. This could be FOLLOWER_LOG_REPLICATION if it is behind with the Log, or FOLLOWER_REPLAY to start replaying the Log, etc.

See onNewLeadershipTerm to see how it's handled during an Election.

AppendPosition¶

Sent by the follower Consensus Modules to tell the leader Consensus Module the follower's append position, i.e. how much of the Log it has written to disk in its Log Recording. This is sent in the Consensus Module as part of its regular duty cycle, sending it whenever it has changed or on the leader heartbeat interval (200 ms).

It is also sent from a few states in an Election, to confirm to the leader where the follower is up to. Note that leadershipTermId field refers to the leadership term that the follower is in. It does not refer to the leadership term of the Log entry at logPosition. If the follower is Replicating missing Log entries, the Log entries will be from an earlier leadership term.

AppendPosition {
    leadershipTermId  long  // current leadership term on the follower
    logPosition       long  // follower's append position
    followerMemberId  int   // follower's id
    flags             byte  // used to indicate whether the follower is in FOLLOWER_CATCHUP
}

It is received in the leader's Consensus Module by the ConsensusAdapter, and is ultimately stores against the member. The leader's Consensus Module will use this when calculating the commit position.

See onAppendPosition to see how it's handled during an Election.

CommitPosition¶

Sent by the leader Consensus Module to tell the follower Consensus Modules the new commit position, i.e. the Log position that a majority of the cluster members have written to disk in their Log Recording. This is the position that the Clustered Services are safe to process up to.

CommitPosition {
    leadershipTermId  long  // current leadership term on the leader
    logPosition       long  // the commit position, calculated from a majority of the AppendPositions
    leaderMemberId    int   // leader's id
}

It is received in the follower Consensus Modules by the ConsensusAdapter. The follower Consensus Module processes the Log up to the new commit position, then sets the commit-pos Counter, which the Clustered Service uses as a limit when it processes the Log.

See onCommitPosition to see how it's handled during an Election.

CatchupPosition¶

Sent by a follower Consensus Module's Election object when it's in the FOLLOWER_CATCHUP_INIT state. This happens when the leader has been running in the current leadership term without the follower, so the follower has to catch up within the current leadership term.

The request is asking the leader to ask its Archive to set up a Replay of the Log from the logPosition in the message, to the catchupEndpoint in the message, which the follower's Consensus Module will be listening on.

CatchupPosition {
    leadershipTermId  long    // current leadership term on the follower
    logPosition       long    // position to replay the leader's Log from
    followerMemberId  int     // follower's id
    catchupEndpoint   String  // follower's catchup endpoint that the leader's Archive will replay to
}

It is received in the leader's Consensus Module by the ConsensusAdapter. The leader asks its Archive to replay the Log to the follower's catchup endpoint.

See onCatchupPosition to see how it's handled during an Election.

StopCatchup¶

Sent by the leader Consensus Module to a follower Consensus Module to tell the follower to stop listening on the Catchup endpoint. When a follower is catching up from the leader's Log, it will be subscribed to an Archive Replay of the leader's Log, which goes to the Catchup endpoint on the follower, and it will be subscribed to the leader's live Log Publication, which will be going to a live Log endpoint on the follower. The follower will be sending AppendPosition messages to the leader with the 'catchup' flag set. When the leader receives these, it checks whether the follower's position has caught up with the leader's. When it has, the leader tells its Archive to stop replaying the Log to the follower, and it sends StopCatchup to the follower, to tell it to remove its Catchup endpoint.

StopCatchup {
    leadershipTermId  long  // current leadership term on the follower
    followerMemberId  int   // follower's id
}

It is received in the follower's Consensus Module by the ConsensusAdapter. The follower removes its catchup endpoint from its Log Subscription, so its Log Subscription is only receiving Log messages from the live Log Publication. This is 'replay merge', and the follower has now caught up.

See onStopCatchup to see how it's handled during an Election.

TerminationPosition¶

Sent by the leader Consensus Module to the follower Consensus Modules to tell them to perform an orderly shutdown, after reaching a particular Log position, because the leader is shutting down. The cluster can be asked to shut down using ClusterTool, which can ask it to shut down (with a snapshot) or abort (without a snapshot).

TerminationPosition {
    leadershipTermId  long  // current leadership term on the leader
    logPosition       long  // the Log position to reach before terminating
}

It is received in the follower Consensus Modules by the ConsensusAdapter. It sets a terminationPosition field in the ConsensusModuleAgent, which it checks on its duty cycle. When found, it sends a ServiceTerminationPosition to the Clustered Service, asking it to perform an orderly shutdown at the Log position. The Clustered Service does so, then returns a ServiceAck, confirming the Log position. The Consensus Module sends a TerminationAck to the leader Consensus Module, records the Log position in the RecordingLog, then terminates.

TerminationAck¶

Sent by a follower Consensus Module to the leader Consensus Module to acknowledge that it (and its Clustered Service) has reached the Log position specified in a TerminatePosition, and is about to terminate.

TerminationAck {
    leadershipTermId  long  // current leadership term on the follower
    logPosition       long  // current Log position on the follower
    memberId          int   // follower's id
}

It is received in the leader Consensus Module by the ConsensusAdapter. It terminates the leader if all the followers have terminated.

BackupQuery¶

Sent by ClusterBackupAgent to one of the Consensus Modules (chosen at random), to query information about snapshots.

BackupQuery {
    correlationId       long    // to return in the response
    responseStreamId    int     // to connect back to ClusterBackupAgent to send the response
    version             int     // consensus module version
    responseChannel     String  // to connect back to ClusterBackupAgent to send the response
    encodedCredentials  byte[]  // set if authentication is enabled, to authenticate ClusterBackupAgent
}

It is received in the Consensus Module by the ConsensusAdapter. It calls into onBackupQuery(), which adds a ClusterSession into the Consensus Module's pendingBackupSessions with an Action of BACKUP (unlike client sessions, which have an Action of CLIENT). Like client sessions, if authentication is configured, the credentials are checked, which might cause a Challenge to be sent, although this is sent on the responseStreamId specified in the BackupQuery, which happens to be the Consensus streamId 108, not the Egress.

Once authenticated, the Consensus Module returns a BackupResponse, using information from the RecordingLog and RecoveryPlan, then removes the ClusterSession.

BackupResponse¶

Sent by a Consensus Module in response to a BackupQuery, returning it to the responseChannel and responseStreamId in the query. This includes all relevant information for a backup operation.

BackupResponse {
    correlationId            long    // from the BackupQuery
    logRecordingId           long    // from the RecoveryPlan
    logLeadershipTermId      long    // from the RecoveryPlan
    logTermBaseLogPosition   long    // from the RecoveryPlan
    lastLeadershipTermId     long    // from the last entry in the RecordingLog
    lastTermBaseLogPosition  long    // from the last entry in the RecordingLog
    commitPositionCounterId  int     // counter id on the consensus module
    leaderMemberId           int     // leader's id
    memberId                 int     // of the consensus module
    snapshots [                      // snapshots for consensus module and clustered service(s)
        recordingId          long    // of the last snapshot
        leadershipTermId     long    // of the last snapshot
        termBaseLogPosition  long    // of the last snapshot
        logPosition          long    // of the last snapshot
        timestamp            long    // of the last snapshot (epoch time in cluster time units)
        serviceId            int     // of the last snapshot
    ]
    clusterMembers           String  // endpoints string
}

It is received by the ClusterBackupAgent, which stores the response data and moves on to the next stage of the backup process.

Challenge¶

This is the same as the Challenge on the Egress, except the Consensus Module is returning it to the ClusterBackupAgent. The ClusterBackupAgent uses the Consensus streamId for the response stream, not the Egress, so the Consensus Module sends the Challenge over the Consensus stream. It is a separate channel to the Consensus Module, so it doesn't affect the normal Consensus channels between Consensus Modules.

ChallengeResponse¶

This is the same as the ChallengeResponse on the Ingress, except it's the ClusterBackupAgent sending the message on the Consensus stream.

HeartbeatRequest¶

This is a request that can be sent to a Consensus Module, but isn't currently used in the Open Source version of Aeron Cluster. It looks like it is used by Aeron Cluster Standby, part of Aeron Premium.

It can be sent from heartbeatRequest() in the ConsensusPublisher, but this is an unused method.

HeartbeatRequest {
    correlationId       long    // to return in the response
    responseStreamId    int     // to connect back to the sender
    responseChannel     String  // to connect back to the sender
    encodedCredentials  byte[]  // if authenticating the request
}

It is received in the Consensus Module by the ConsensusAdapter. It calls into onHeartbeatRequest(), which adds a ClusterSession into the Consensus Module's pendingBackupSessions with an Action of HEARTBEAT (unlike client sessions, which have an Action of CLIENT). Like client sessions, if authentication is configured, the credentials are checked, which might cause a Challenge to be sent, although this is sent on the responseStreamId specified in the HeartbeatRequest, which will probably be the Consensus streamId 108, not the Egress.

Once authenticated, the Consensus Module returns a HeartbeatResponse, then removes the ClusterSession.

HeartbeatResponse¶

This is sent by a Consensus Module in response to a HeartbeatRequest.

HeartbeatResponse {
    correlationId  long  // from the HeartbeatRequest
}

It isn't currently received by anything in the Open Source version of Aeron Cluster.

StandbySnapshot¶

This is a notification from the Consensus Module that a Standby Snapshot has been taken on a remote node. It is used by Aeron Cluster Standby, part of Aeron Premium, not the Open Source version of Aeron Cluster.

It is sent from standbySnapshotTaken() in the ConsensusPublisher, but this is an unused method in the Open Source version.

StandbySnapshot {
    correlationId            long
    version                  int
    responseStreamId         int
    snapshots [
        recordingId          long
        leadershipTermId     long
        termBaseLogPosition  long
        logPosition          long
        timestamp            long
        serviceId            int
        archiveEndpoint      String
    ]
    responseChannel          String
    encodedCredentials       byte[]
}