Duty Cycle
ConsensusModuleAgent.doWork()¶
All the consensus behaviour in the Consensus Module is in the ConsensusModuleAgent's duty cycle, which is the
doWork()
method. Once the ConsensusModuleAgent has started, doWork()
is called on it repeatedly by its AgentRunner. At a high
level, the duty cycle performs the following:
- updates the Consensus Module DutyCycleStallTracker, which tracks stalls in the Cluster max cycle time and Cluster work cycle time exceeded count Counters
- the duty cycle performs 'slow tick work' every 10 ms, so if the deadline has passed, it performs this work (described below)
- it polls the ConsensusAdapter, reading and processing any Consensus messages from the other cluster members
- then, it either performs Election work or Consensus work:
- if it has an Election object, it calls
doWork()
on it, to perform one cycle of Election work - if it doesn't, it performs Consensus work (described below)
- if it has an Election object, it calls
Elections are a big topic and are described later.
Slow tick work¶
This is in slowTickWork()
and consists of:
- the Client Conductor in the Aeron Transport Client is configured to run in invoker mode, so it is invoked here. This work consists of checking for various timeouts and processing any messages from the Media Driver, such as a confirmation that an operation succeeded or failed (like creating a Publication), or when a new Image becomes available for a Subscription
- updating the timestamp in the cluster mark file once per second
- polling for Recording Signal events from the Archive, such as an error event, or an indication that a Recording has stopped
- sending any redirect messages to ClusterSessions in
redirectUserSessions
(see Clients) - sending any rejection messages to clients in
rejectedUserSessions
(see Clients) - sending any rejection messages to backup clients in
rejectedBackupSessions
(see Clients)
After this, slow tick work diverges depending on the Consensus Module's cluster role.
If it's the leader, slow tick work continues with:
- it checks a Cluster control toggle Counter, which can be set by ClusterTool to toggle the cluster into a different state, such as SUSPEND, RESUME or SNAPSHOT
- it processes any pending ClusterSessions in
pendingUserSessions
(CLIENT sessions - see ActionType) - it processes any pending ClusterSessions in
pendingBackupSessions
(non-CLIENT sessions - see ActionType) - after processing pending sessions, it checks all the current CLIENT ClusterSessions. It closes the session if it has
timed out (last activity > 10 seconds ago (
aeron.cluster.session.timeout
)). If it is a new session and a SessionEvent hasn't successfully been sent on the Egress yet, it is retried. If a NewLeaderEvent is pending on the Egress, this is also attempted - it checks if it still has and active quorum of followers. A follower is considered active if it has sent an
AppendPosition message within the last 10 seconds
(
aeron.cluster.leader.heartbeat.timeout
). If it doesn't have an active quorum, it starts an Election with reason "inactive follower quorum"
If it's a follower:
- it processes pending ClusterSessions with action types: BACKUP, HEARTBEAT and STANDBY_SNAPSHOT (see ActionType)
Then, if it's a follower or a candidate:
- if it hasn't received a message from the leader for 10 seconds (
aeron.cluster.leader.heartbeat.timeout
), it enters an Election with the reason "leader heartbeat timeout"
And finally:
- it checks a Node control toggle Counter, which is similar to the Cluster control toggle, but can be run on any node, not just the leader. This Counter is currently only used to replicate standby snapshots (an Aeron Premium feature).
Consensus work¶
This is in consensusWork()
. If it's the leader, it:
- checks the TimerService for any expired timers. For each expired timer, it adds a TimerEvent to the Log. Being on the Log means it will go through the same consensus process as the Ingress messages, and will be processed at the same position relative to other messages on each cluster member
- polls the PendingServiceMessageTracker (see link for details)
- polls the IngressAdapter, which polls the Ingress Subscription. This polls each of the Ingress log buffers and calls
a method on the ConsensusModuleAgent. Messages from the client application will be wrapped with a
SessionMessageHeader - these are passed to
onIngressMessage()
. That adds the message to the Log (via LogPublisher), also adding the current cluster time. Other messages, such as SessionConnectRequest are passed to other methods, e.g.onSessionConnect()
- finally, it reads the AppendPosition Counter, which is how much of the Log the Archive has written to disk. It sets the logPosition on its ClusterMember, then calculates a new quorum position (this could change if the leader's Archive was slower to write to disk than a follower's Archive, or for a single node cluster). If the commit position has changed, or after 200 ms, it sends the quorum position to the followers in a CommitPosition message. It then sweeps uncommitted entries up to the commit position
If it's a follower, it:
- polls the LogAdapter up to the
notifiedCommitPosition
, which came from the leader in the last CommitPosition message. This lets the follower ConsensusModuleAgent process Log messages such as SessionOpenEvent, so it can maintain its list of ClusterSessions - after that, it advances its commit-pos Counter to the new Log position, which will
be
notifiedCommitPosition
. This lets the Clustered Service process the Log up to the same position, after the Consensus Module - it polls the IngressAdapter. Note this is done even though it's a follower, as clients can connect to a follower
when trying to find the leader. Messages are sent to the ConsensusModuleAgent, as they are on the leader, but the
ConsensusModuleAgent checks whether it's the leader before processing them. The only message that's processed when
the cluster member is a follower is a SessionConnectRequest. For a
follower, it adds a ClusterSession to a list of
redirectUserSessions
, which will eventually cause a redirect to be sent to the client - it then reads its AppendPosition Counter and if it has changed, or it hasn't sent an AppendPosition message to the leader for 200 ms, it sends the Counter value to the leader in an AppendPosition message
And finally, it:
- polls the ConsensusModuleAdapter, where it reads any messages from the Clustered Service(s). This includes messages that the Clustered Service(s) want to append to the Log, which are added to the Clustered Service's PendingServiceMessageTracker