Leader Init
LEADER_INIT - start recording the Log Publication, ask ClusteredService to Subscribe to it
On Entry¶
Enters from LEADER_REPLAY, where, if it needed to, the Consensus Module replayed (processed) the Log up to the end of the Recording. The Clustered Service will have started doing the same, and may not have finished.
Description¶
In this state, the leader asks its Archive to start recording the Log Publication (created in LEADER_LOG_REPLICATION), either by appending to the end of the existing Log Recording, or creating a new one if there isn't one. It asks the Clustered Service to Subscribe to the Log Publication too, and waits for it to ack that it has done so. If there is an existing Log Recording, the Clustered Service may still be replaying and processing messages from it, from an Archive Replay Publication. If so, it will complete that and disconnect from the Archive Replay Publication, before subscribing to the live Log Publication.
Consensus Module¶
The Election asks the ConsensusModuleAgent to
join the Log as leader
(there's a different method for the follower). It passes it the Election's logSessionId
, which is the sessionId
of the Log Publication created in LEADER_LOG_REPLICATION. This is the session in the Log Publication that the
Archive needs to start appending to the Log Recording (there is only one session - this is just extra security).
Joining as leader does the following:
- It asks the Archive to start recording the Log Publication. If the RecordingLog has no record of an existing
recording, the Consensus Module sends a StartRecordingRequest2 to start a new Recording. If there is an existing
recording, it sends an ExtendRecordingRequest2 to start recording at the end of it. The Consensus Module passes it
the Log channel, logStreamId (100) and
logSessionId
. It also specifies SourceLocation.LOCAL, as it is on the sending end of a UDP channel and can read from the sending log buffer directly, rather than round-tripping the Log over UDP to record it (the Archive will use a Spy subscription). - When the Archive has set up the Recording, it creates a RecordingPos (
rec-pos
) Counter. It embeds thelogSessionId
in the Counter's label - The Consensus Module waits for the RecordingPos counter to exist, looking for one with
logSessionId
in the label. It creates a read-only view of the Counter, which it refers to as the AppendPosition Counter in the Consensus Module (AppendPosition isrec-pos
) - The Consensus Module then waits for the Services to become ready.
This involves sending a JoinLog to the Clustered Service, to get it to subscribe
to the Log Publication, starting from
logPosition
. At this point,logPosition
is appendPosition, which is the end of the Log, which is also where the new Log Recording starts. Unlike when replaying the Log in LEADER_REPLAY, there is nomaxLogPosition
. The Consensus Module then waits for a ServiceAck from the Clustered Service to say it has subscribed to the Log and is at the startPosition.
Note that the leader Consensus Module does not subscribe to the Log itself, only the Clustered Service. This is different than in LEADER_REPLAY, where both the leader Consensus Module and Clustered Service replayed the Log Recording. For the live Log, the leader Consensus Module is what appends to the Log - it processes messages as it appends them to the Log. This doesn't require consensus, as the 'processing' done by the Consensus Module is just client session tracking - it maintains the client connections.
The payload of the JoinLog message differs when sent from LEADER_REPLAY (top line) and LEADER_INIT (bottom line), as shown in this example:
| logPosition | maxLogPosition | memberId | logSessionId | logStreamId | isStartup | role | logChannel |
|-------------|----------------|----------|--------------|-------------|-----------|------|-------------------------------------------------------------|
| 0 | 1312 | 0 | 1448296859 | 103 | TRUE | 0 | aeron:ipc?session-id=1448296859 |
| 1312 | Long.MAX_VALUE | 0 | 1448296858 | 100 | TRUE | 2 | aeron-spy:aeron:udp?tags=69|session-id=1448296858|alias=log |
In LEADER_REPLAY, it's asking the Clustered Service to connect to the Archive - the logChannel
is
an IPC channel to the Archive, and it's using the replay streamId (103). It replays the Log from logPosition
0
(which could have been a snapshot position) to maxLogPosition
1312, which is the end of the Recording. At this
point, the role in the JoinLog is hardcoded to FOLLOWER (0), which stops the Clustered Service from reconnecting
the Egress sessions.
In LEADER_INIT, it's asking the Clustered Service to connect to the leader's (its own) Log Publication. It asks the
Clustered Service to join the Log at the current Log position 1312. The logChannel
is the UDP channel that the
followers will subscribe to, but because the Clustered Service is on the same machine (the sending end of a UDP
channel), it uses a Spy subscription. The logStreamId
is the normal Log streamId (100). The role is now LEADER (2).
Clustered Service¶
When the Clustered Service reads the JoinLog, it handles it in the same way as in LEADER_REPLAY:
- it captures the values from the JoinLog into an ActiveLogEvent
- when it checks for an ActiveLogEvent, it might find that it is still reading from an Image for LogReplay. If so, it does nothing until LogReplay is complete
- when it is no longer reading from an Image, it processes the ActiveLogEvent by subscribing to the Log stream at the specified start position, waiting for the image, sending a ServiceAck to say it's at the start position
- once all the above is done, if the election was triggered while the cluster was running (
isStartup=false
in the JoinLog), e.g. the leader died, triggering an Election, which this member won, then the Clustered Service creates Egress Publications for each of the client sessions
Consensus Module¶
Once the Consensus Module receives the ServiceAck, it moves to LEADER_READY.
On Exit¶
- its Archive is recording the Log Publication
- the Clustered Service has finished replaying the Log from LEADER_REPLAY and is now subscribed to the live Log Publication, ready to process any new Log messages
- moves to LEADER_READY