Consensus Messages¶
Most Consensus messages are sent between the Consensus Modules. They are primarily to achieve consensus on the Log, or to communicate during an Election. Each cluster member keeps a Consensus Publication to each of the others. There are some other messages sent using the Consensus streamId that are from external tooling / frameworks - these are highlighted in the list below.
The Consensus messages are described below.
- CanvassPosition
- RequestVote
- Vote
- NewLeadershipTerm
- AppendPosition
- CommitPosition
- CatchupPosition
- StopCatchup
- TerminationPosition
- TerminationAck
- BackupQuery (from ClusterBackupAgent)
- BackupResponse (to ClusterBackupAgent)
- Challenge (to ClusterBackupAgent)
- ChallengeResponse (from ClusterBackupAgent)
- HeartbeatRequest (from Cluster Standby)
- HeartbeatResponse (to Cluster Standby)
- StandbySnapshot (part of Cluster Standby)
CanvassPosition¶
Sent by a Consensus Module's Election object when it's in the CANVASS or NOMINATE state. Used to publish its Log Recording append position to the other cluster members, so they can all see which member has the most recent state, prior to electing a new leader. The message is sent to all the other cluster members.
CanvassPosition {
logLeadershipTermId long // leadership term of the last message in the Log Recording
logPosition long // position of the last message in the Log Recording
leadershipTermId long // current leadership term of the member
followerMemberId int // this member's id, so the recipients know the sender
protocolVersion int // consensus module version
}
It is received in the Consensus Module by the ConsensusAdapter.
If it's in an Election, it is given to Election.onCanvassPosition()
,
otherwise, the ConsensusModuleAgent returns a NewLeadershipTerm to the sender, which tells it
to join the current leadership term.
Example: first CanvassPosition after starting up a new cluster member with no state:
CanvassPosition {
logLeadershipTermId: -1 // nothing in the Log Recording
logPosition : 0 // never received any Log messages
leadershipTermId : -1 // current term (not in one, as not yet been in an Election)
followerMemberId : 1 // member id
protocolVersion : 65536 // hardcoded version
}
See onCanvassPosition to see how it's handled during an Election.
RequestVote¶
Sent by a Consensus Module's Election object when it's in the CANDIDATE_BALLOT state, to request the other members vote for it. This is when the cluster member has received CanvassPosition messages from the other members and thinks it has the latest or join-latest in its Log Recording, so is a candidate to become leader.
RequestVote {
logLeadershipTermId long // leadership term of the last message in the Log Recording
logPosition long // position of the last message in the Log Recording
candidateTermId long // proposed next leadership term
candidateMemberId int // this member's id, so the recipients know the sender
protocolVersion int // consensus module version
}
It is received in the Consensus Module by the ConsensusAdapter.
If it's in an Election, it is given to Election.onRequestVote()
,
otherwise, if the candidateTermId
is higher than its current leadership term, it enters an Election itself.
Example: first RequestVote after starting up a new cluster member with no state:
RequestVote {
logLeadershipTermId: -1 // nothing in the Log Recording
logPosition : 0 // never received any Log messages
candidateTermId : 0 // proposed next leadership term
candidateMemberId : 1 // sender's member id
protocolVersion : 65536 // hardcoded version
}
See onRequestVote to see how it's handled during an Election.
Vote¶
Sent by a Consensus Module's Election object in response to a RequestVote.
Vote {
candidateTermId long // from RequestVote
logLeadershipTermId long // sender's leadership term of the last message in the Log Recording
logPosition long // sender's position of the last message in the Log Recording
candidateMemberId int // from RequestVote
followerMemberId int // this member's id
vote boolean // voting intention
}
It is received in the Consensus Module by the ConsensusAdapter.
If it's in an Election, it is given to Election.onVote()
,
which records the vote for the cluster member, otherwise it's ignored.
See onVote to see how it's handled during an Election.
NewLeadershipTerm¶
This is sent by the leader Consensus Module in three circumstances:
- when it has won the Election, to notify the other members. It describes the leadership term that it is leading. The same message is sent to all followers every 200 ms, while in LEADER_LOG_REPLICATION, LEADER_REPLAY, and LEADER_READY
- it can be sent as a reply, specifically to one member, if the member starts and sends a CanvassPosition message while the other members are already running. The leader doesn't enter into an Election itself. It echoes the logLeadershipTermId field from the CanvassPosition message and returns information about the next leadership term after that one, and about the current leadership term. The member uses this information to start the process of back-filling missing Log entries
- it can be sent as a reply, specifically to one member, if the member sends a RequestVote message. To be honest, I'm not sure how this situation can arise! todo: investigate
These are the fields and how they are used when sent in response to a CanvassPosition (see Replication and Catchup Details for a worked example):
NewLeadershipTerm {
logLeadershipTermId long // echoed from CanvassPosition - the next 3 fields are relative to this term
nextLeadershipTermId long // next term's id
nextTermBaseLogPosition long // next term's start position
nextLogPosition long // next term's log position (end position)
--- above here is in response to a CANVASS
leadershipTermId long // the current term's id
termBaseLogPosition long // the current term's start position
logPosition long // the current log position (log publisher position)
leaderRecordingId long // the current log recordingId on the leader
timestamp long // current time on the leader (epoch time in cluster time units)
leaderMemberId int // leader's id
logSessionId int // sessionId of the log Publication
appVersion int // consensus module version
isStartup boolean // FALSE when sent in reply to a CanvassPosition
}
When not sent in response to a CanvassPosition, the first 4 fields refer to the leadership term that is about to start.
As a concrete example, these are all the NewLeadershipTerm messages sent in the Follower offline deep dive:
- 1) is where the 3 node cluster started for the first time and node 1 became leader in leadership term 0
- 8) is after node 0 was killed and nodes 1 and 2 restarted without it - node 1 became leader again in leadership term 1
- 13) is after nodes 1 and 2 restarted without node 0 - node 2 became leader in leadership term 2
- 18) is after nodes 1 and 2 restarted without node 0 - node 1 became leader in leadership term 3
- 22) is after node 0 started and went through the replication and catchup process, progressing its way through the leadership terms
1) in LEADER_LOG_REPLICATION, to nodes 0 and 2: payload: logLeadershipTermId=-1|nextLeadershipTermId=0|nextTermBaseLogPosition=0 |nextLogPosition=-1 |leadershipTermId=0|termBaseLogPosition=0 |logPosition=0 |leaderRecordingId=-1|timestamp=1737306778533|leaderMemberId=1|logSessionId=464720373 |appVersion=1|isStartup=FALSE
payload: logLeadershipTermId=-1|nextLeadershipTermId=0|nextTermBaseLogPosition=0 |nextLogPosition=-1 |leadershipTermId=0|termBaseLogPosition=0 |logPosition=0 |leaderRecordingId=-1|timestamp=1737306778533|leaderMemberId=1|logSessionId=464720373 |appVersion=1|isStartup=FALSE
in LEADER_REPLAY, to nodes 0 and 2: payload: logLeadershipTermId=-1|nextLeadershipTermId=0|nextTermBaseLogPosition=0 |nextLogPosition=-1 |leadershipTermId=0|termBaseLogPosition=0 |logPosition=0 |leaderRecordingId=-1|timestamp=1737306778533|leaderMemberId=1|logSessionId=464720373 |appVersion=1|isStartup=TRUE
payload: logLeadershipTermId=-1|nextLeadershipTermId=0|nextTermBaseLogPosition=0 |nextLogPosition=-1 |leadershipTermId=0|termBaseLogPosition=0 |logPosition=0 |leaderRecordingId=-1|timestamp=1737306778533|leaderMemberId=1|logSessionId=464720373 |appVersion=1|isStartup=TRUE
in LEADER_READY, to nodes 0 and 2: payload: logLeadershipTermId=0 |nextLeadershipTermId=0|nextTermBaseLogPosition=0 |nextLogPosition=-1 |leadershipTermId=0|termBaseLogPosition=0 |logPosition=0 |leaderRecordingId=0 |timestamp=1737306778547|leaderMemberId=1|logSessionId=464720373 |appVersion=1|isStartup=TRUE
payload: logLeadershipTermId=0 |nextLeadershipTermId=0|nextTermBaseLogPosition=0 |nextLogPosition=-1 |leadershipTermId=0|termBaseLogPosition=0 |logPosition=0 |leaderRecordingId=0 |timestamp=1737306778547|leaderMemberId=1|logSessionId=464720373 |appVersion=1|isStartup=TRUE
8) in LEADER_LOG_REPLICATION, to node 2: payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=-1 |leadershipTermId=1|termBaseLogPosition=2848|logPosition=2848|leaderRecordingId=0 |timestamp=1737306850311|leaderMemberId=1|logSessionId=1884081998 |appVersion=1|isStartup=FALSE
in LEADER_REPLAY, both to node 2: payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=-1 |leadershipTermId=1|termBaseLogPosition=2848|logPosition=2848|leaderRecordingId=0 |timestamp=1737306850312|leaderMemberId=1|logSessionId=1884081998 |appVersion=1|isStartup=TRUE
payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=-1 |leadershipTermId=1|termBaseLogPosition=2848|logPosition=2848|leaderRecordingId=0 |timestamp=1737306850335|leaderMemberId=1|logSessionId=1884081998 |appVersion=1|isStartup=TRUE
in LEADER_READY, to node 2: payload: logLeadershipTermId=1 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=-1 |leadershipTermId=1|termBaseLogPosition=2848|logPosition=2848|leaderRecordingId=0 |timestamp=1737306850656|leaderMemberId=1|logSessionId=1884081998 |appVersion=1|isStartup=TRUE
13) in LEADER_LOG_REPLICATION, to node 1: payload: logLeadershipTermId=1 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=-1 |leadershipTermId=2|termBaseLogPosition=4064|logPosition=4064|leaderRecordingId=0 |timestamp=1737306917688|leaderMemberId=2|logSessionId=-2066487224|appVersion=1|isStartup=FALSE
in LEADER_REPLAY, both to node 1: payload: logLeadershipTermId=1 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=-1 |leadershipTermId=2|termBaseLogPosition=4064|logPosition=4064|leaderRecordingId=0 |timestamp=1737306917688|leaderMemberId=2|logSessionId=-2066487224|appVersion=1|isStartup=TRUE
payload: logLeadershipTermId=1 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=-1 |leadershipTermId=2|termBaseLogPosition=4064|logPosition=4064|leaderRecordingId=0 |timestamp=1737306917698|leaderMemberId=2|logSessionId=-2066487224|appVersion=1|isStartup=TRUE
in LEADER_READY, to node 1: payload: logLeadershipTermId=2 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=-1 |leadershipTermId=2|termBaseLogPosition=4064|logPosition=4064|leaderRecordingId=0 |timestamp=1737306918111|leaderMemberId=2|logSessionId=-2066487224|appVersion=1|isStartup=TRUE
18) in LEADER_LOG_REPLICATION, to node 2: payload: logLeadershipTermId=2 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1 |leadershipTermId=3|termBaseLogPosition=5280|logPosition=5280|leaderRecordingId=0 |timestamp=1737306985374|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
in LEADER_REPLAY, both to node 2: payload: logLeadershipTermId=2 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1 |leadershipTermId=3|termBaseLogPosition=5280|logPosition=5280|leaderRecordingId=0 |timestamp=1737306985374|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=TRUE
payload: logLeadershipTermId=2 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1 |leadershipTermId=3|termBaseLogPosition=5280|logPosition=5280|leaderRecordingId=0 |timestamp=1737306985388|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=TRUE
in LEADER_READY, to node 2: payload: logLeadershipTermId=3 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1 |leadershipTermId=3|termBaseLogPosition=5280|logPosition=5280|leaderRecordingId=0 |timestamp=1737306985861|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=TRUE
22) onCanvassPosition, to node 0 (all 4): payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=4064|leadershipTermId=3|termBaseLogPosition=5280|logPosition=6496|leaderRecordingId=0 |timestamp=1737306989652|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
payload: logLeadershipTermId=0 |nextLeadershipTermId=1|nextTermBaseLogPosition=2848|nextLogPosition=4064|leadershipTermId=3|termBaseLogPosition=5280|logPosition=6496|leaderRecordingId=0 |timestamp=1737306989744|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
payload: logLeadershipTermId=1 |nextLeadershipTermId=2|nextTermBaseLogPosition=4064|nextLogPosition=5280|leadershipTermId=3|termBaseLogPosition=5280|logPosition=6496|leaderRecordingId=0 |timestamp=1737306989946|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
payload: logLeadershipTermId=2 |nextLeadershipTermId=3|nextTermBaseLogPosition=5280|nextLogPosition=-1 |leadershipTermId=3|termBaseLogPosition=5280|logPosition=6592|leaderRecordingId=0 |timestamp=1737306989974|leaderMemberId=1|logSessionId=1115055142 |appVersion=1|isStartup=FALSE
It is received in the Consensus Module by the ConsensusAdapter and passed to the ConsensusModuleAgent. If the Consensus Module is in an Election, at this point, it would accept the new leader and move to the appropriate Election state as a follower. This could be FOLLOWER_LOG_REPLICATION if it is behind with the Log, or FOLLOWER_REPLAY to start replaying the Log, etc.
See onNewLeadershipTerm to see how it's handled during an Election.
AppendPosition¶
Sent by the follower Consensus Modules to tell the leader Consensus Module the follower's append position, i.e. how much of the Log it has written to disk in its Log Recording. This is sent in the Consensus Module as part of its regular duty cycle, sending it whenever it has changed or on the leader heartbeat interval (200 ms).
It is also sent from a few states in an Election, to confirm to the leader where the follower is up to. Note that leadershipTermId field refers to the leadership term that the follower is in. It does not refer to the leadership term of the Log entry at logPosition. If the follower is Replicating missing Log entries, the Log entries will be from an earlier leadership term.
AppendPosition {
leadershipTermId long // current leadership term on the follower
logPosition long // follower's append position
followerMemberId int // follower's id
flags byte // used to indicate whether the follower is in FOLLOWER_CATCHUP
}
It is received in the leader's Consensus Module by the ConsensusAdapter, and is ultimately stores against the member. The leader's Consensus Module will use this when calculating the commit position.
See onAppendPosition to see how it's handled during an Election.
CommitPosition¶
Sent by the leader Consensus Module to tell the follower Consensus Modules the new commit position, i.e. the Log position that a majority of the cluster members have written to disk in their Log Recording. This is the position that the Clustered Services are safe to process up to.
CommitPosition {
leadershipTermId long // current leadership term on the leader
logPosition long // the commit position, calculated from a majority of the AppendPositions
leaderMemberId int // leader's id
}
It is received in the follower Consensus Modules by the ConsensusAdapter. The follower Consensus Module processes the Log up to the new commit position, then sets the commit-pos Counter, which the Clustered Service uses as a limit when it processes the Log.
See onCommitPosition to see how it's handled during an Election.
CatchupPosition¶
Sent by a follower Consensus Module's Election object when it's in the FOLLOWER_CATCHUP_INIT state. This happens when the leader has been running in the current leadership term without the follower, so the follower has to catch up within the current leadership term.
The request is asking the leader to ask its Archive to set up a Replay of the Log from the logPosition
in the message,
to the catchupEndpoint
in the message, which the follower's Consensus Module will be listening on.
CatchupPosition {
leadershipTermId long // current leadership term on the follower
logPosition long // position to replay the leader's Log from
followerMemberId int // follower's id
catchupEndpoint String // follower's catchup endpoint that the leader's Archive will replay to
}
It is received in the leader's Consensus Module by the ConsensusAdapter. The leader asks its Archive to replay the Log to the follower's catchup endpoint.
See onCatchupPosition to see how it's handled during an Election.
StopCatchup¶
Sent by the leader Consensus Module to a follower Consensus Module to tell the follower to stop listening on the Catchup endpoint. When a follower is catching up from the leader's Log, it will be subscribed to an Archive Replay of the leader's Log, which goes to the Catchup endpoint on the follower, and it will be subscribed to the leader's live Log Publication, which will be going to a live Log endpoint on the follower. The follower will be sending AppendPosition messages to the leader with the 'catchup' flag set. When the leader receives these, it checks whether the follower's position has caught up with the leader's. When it has, the leader tells its Archive to stop replaying the Log to the follower, and it sends StopCatchup to the follower, to tell it to remove its Catchup endpoint.
StopCatchup {
leadershipTermId long // current leadership term on the follower
followerMemberId int // follower's id
}
It is received in the follower's Consensus Module by the ConsensusAdapter. The follower removes its catchup endpoint from its Log Subscription, so its Log Subscription is only receiving Log messages from the live Log Publication. This is 'replay merge', and the follower has now caught up.
See onStopCatchup to see how it's handled during an Election.
TerminationPosition¶
Sent by the leader Consensus Module to the follower Consensus Modules to tell them to perform an orderly shutdown, after reaching a particular Log position, because the leader is shutting down. The cluster can be asked to shut down using ClusterTool, which can ask it to shut down (with a snapshot) or abort (without a snapshot).
TerminationPosition {
leadershipTermId long // current leadership term on the leader
logPosition long // the Log position to reach before terminating
}
It is received in the follower Consensus Modules by the ConsensusAdapter.
It sets a terminationPosition
field in the ConsensusModuleAgent, which it checks on its duty cycle. When found,
it sends a ServiceTerminationPosition to the Clustered Service, asking it
to perform an orderly shutdown at the Log position. The Clustered Service does so, then returns a
ServiceAck, confirming the Log position. The Consensus Module sends a
TerminationAck to the leader Consensus Module, records the Log position in the
RecordingLog, then terminates.
TerminationAck¶
Sent by a follower Consensus Module to the leader Consensus Module to acknowledge that it (and its Clustered Service) has reached the Log position specified in a TerminatePosition, and is about to terminate.
TerminationAck {
leadershipTermId long // current leadership term on the follower
logPosition long // current Log position on the follower
memberId int // follower's id
}
It is received in the leader Consensus Module by the ConsensusAdapter. It terminates the leader if all the followers have terminated.
BackupQuery¶
Sent by ClusterBackupAgent to one of the Consensus Modules (chosen at random), to query information about snapshots.
BackupQuery {
correlationId long // to return in the response
responseStreamId int // to connect back to ClusterBackupAgent to send the response
version int // consensus module version
responseChannel String // to connect back to ClusterBackupAgent to send the response
encodedCredentials byte[] // set if authentication is enabled, to authenticate ClusterBackupAgent
}
It is received in the Consensus Module by the ConsensusAdapter.
It calls into onBackupQuery()
,
which adds a ClusterSession into the Consensus Module's pendingBackupSessions
with an Action of BACKUP (unlike client
sessions, which have an Action of CLIENT). Like client sessions, if authentication is configured, the credentials are
checked, which might cause a Challenge to be sent, although this is sent on the responseStreamId
specified in the BackupQuery, which happens to be the Consensus streamId 108, not the Egress.
Once authenticated, the Consensus Module returns a BackupResponse, using information from the RecordingLog and RecoveryPlan, then removes the ClusterSession.
BackupResponse¶
Sent by a Consensus Module in response to a BackupQuery, returning it to the responseChannel
and
responseStreamId
in the query. This includes all relevant information for a backup operation.
BackupResponse {
correlationId long // from the BackupQuery
logRecordingId long // from the RecoveryPlan
logLeadershipTermId long // from the RecoveryPlan
logTermBaseLogPosition long // from the RecoveryPlan
lastLeadershipTermId long // from the last entry in the RecordingLog
lastTermBaseLogPosition long // from the last entry in the RecordingLog
commitPositionCounterId int // counter id on the consensus module
leaderMemberId int // leader's id
memberId int // of the consensus module
snapshots [ // snapshots for consensus module and clustered service(s)
recordingId long // of the last snapshot
leadershipTermId long // of the last snapshot
termBaseLogPosition long // of the last snapshot
logPosition long // of the last snapshot
timestamp long // of the last snapshot (epoch time in cluster time units)
serviceId int // of the last snapshot
]
clusterMembers String // endpoints string
}
It is received by the ClusterBackupAgent, which stores the response data and moves on to the next stage of the backup process.
Challenge¶
This is the same as the Challenge on the Egress, except the Consensus Module is returning it to the ClusterBackupAgent. The ClusterBackupAgent uses the Consensus streamId for the response stream, not the Egress, so the Consensus Module sends the Challenge over the Consensus stream. It is a separate channel to the Consensus Module, so it doesn't affect the normal Consensus channels between Consensus Modules.
ChallengeResponse¶
This is the same as the ChallengeResponse on the Ingress, except it's the ClusterBackupAgent sending the message on the Consensus stream.
HeartbeatRequest¶
This is a request that can be sent to a Consensus Module, but isn't currently used in the Open Source version of Aeron Cluster. It looks like it is used by Aeron Cluster Standby, part of Aeron Premium.
It can be sent from heartbeatRequest()
in the ConsensusPublisher, but this is an unused method.
HeartbeatRequest {
correlationId long // to return in the response
responseStreamId int // to connect back to the sender
responseChannel String // to connect back to the sender
encodedCredentials byte[] // if authenticating the request
}
It is received in the Consensus Module by the ConsensusAdapter.
It calls into onHeartbeatRequest()
,
which adds a ClusterSession into the Consensus Module's pendingBackupSessions
with an Action of HEARTBEAT (unlike
client sessions, which have an Action of CLIENT). Like client sessions, if authentication is configured, the
credentials are checked, which might cause a Challenge to be sent, although this is sent on the
responseStreamId
specified in the HeartbeatRequest, which will probably be the Consensus streamId 108, not the Egress.
Once authenticated, the Consensus Module returns a HeartbeatResponse, then removes the ClusterSession.
HeartbeatResponse¶
This is sent by a Consensus Module in response to a HeartbeatRequest.
HeartbeatResponse {
correlationId long // from the HeartbeatRequest
}
It isn't currently received by anything in the Open Source version of Aeron Cluster.
StandbySnapshot¶
This is a notification from the Consensus Module that a Standby Snapshot has been taken on a remote node. It is used by Aeron Cluster Standby, part of Aeron Premium, not the Open Source version of Aeron Cluster.
It is sent from standbySnapshotTaken()
in the ConsensusPublisher, but this is an unused method in the Open Source version.
StandbySnapshot {
correlationId long
version int
responseStreamId int
snapshots [
recordingId long
leadershipTermId long
termBaseLogPosition long
logPosition long
timestamp long
serviceId int
archiveEndpoint String
]
responseChannel String
encodedCredentials byte[]
}