State and Snapshots
This page looks at the state held in the Consensus Module. More specifically, it looks at the state that is important enough that it needs to be written to a snapshot and reloaded at startup.
The majority of the state within Aeron Cluster is the application state in the Clustered Service. The Consensus Module also contains a small amount of state, which it writes to a separate snapshot whenever the Clustered Service is snapshotted.
The overall snapshotting process is described later in Snapshots.
We'll look at the state based on the contents of the snapshot.
Snapshots¶
The ConsensusModuleAgent takes snapshots using a ConsensusModuleSnapshotTaker and loads them using a ConsensusModuleSnapshotAdapter.
The outline of a snapshot is as follows. Each part is described below.
ConsensusModuleSnapshot {
SnapshotMarker (START)
ConsensusModule // some of its state
ClusterSession // one for each cluster session
Timer // one for each timer in the TimerService
PendingMessageTracker // one per ClusteredService
SnapshotMarker (END) // same content as the start marker
}
SnapshotMarker¶
This is used to mark the start and end of a snapshot. It is also used in Clustered Service snapshots.
SnapshotMarker {
typeId long // 1 for the Consensus Module (2 for a Clustered Service)
logPosition long // Log position at which the snapshot was taken
leadershipTermId long // current leadership term when the snapshot was taken
index int // currently unused (hardcoded to 0)
mark SnapshotMark // enum: BEGIN, SECTION, END
timeUnit ClusterTimeUnit // enum: MILLIS, MICROS, NANOS
appVersion int // application version, default 0.0.1 encoded into an int
}
SnapshotMark BEGIN and END identifies the marker type. SECTION is currently unused.
appVersion
is an application version number that your application can set in the ConsensusModule,
should it wish to. It is a semantic version that defaults to 0.0.1.
ConsensusModule¶
ConsensusModule {
nextSessionId long // clusterSessionId to assign to the next new ClusterSession
nextServiceSessionId long
logServiceSessionId long
pendingMessageCapacity int // capacity of a container
}
todo: complete once I've done PendingServiceMessageTracker
ClusterSession¶
ClusterSession {
clusterSessionId long // this session's id
correlationId long // correlationId from the last Ingress message received
openedLogPosition long // Log position after SessionOpenEvent was sent for the session
timeOfLastActivity long // hardcoded to NULL_VALUE (-1)
closeReason CloseReason // enum: CLIENT_ACTION, SERVICE_ACTION, TIMEOUT
responseStreamId int // for the session's Egress
responseChannel String // for the session's Egress
}
This is for each ClusterSession created for a client connection, but only those that are
OPEN or CLOSING. Temporary ClusterSessions created by tools like ClusterTool are not added
to the snapshot (they are not in the sessions
field, which is used for the snapshot).
closeReason
is set to CloseReason.NULL_VAL if state is OPEN, which is used to determine the state when the snapshot
is loaded.
Why store these though? If a cluster is running and an Election happens, the new leader re-establishes client connections, as Elections are usually fast and clients are likely to be waiting (default session timeout is 10 seconds). But snapshots are only loaded when the cluster starts, and client connections are not re-established at startup, they are actively closed at the end of the first Election.
The reason is that the Clustered Services need to be notified that the ClusteredSessions have closed, otherwise they won't know. They take all input from the Log, so they need a Log message. After loading the snapshot and completing an Election, the leader ConsensusModuleAgent checks for session timeouts as part of its normal duty cycle. It finds the CLOSING ClusterSessions as inactive, because timeOfLastActivity was set to -1 in the snapshot, then for each of them, publishes a SessionCloseEvent to the Log, before discarding the session.
Timer¶
Timer {
correlationId long // timer id, provided by the application
deadline long // expiry time (epoch time in cluster time units)
}
When the Clustered Service creates a Timer, it sends a ScheduleTimer message to its Consensus Module, which adds the timer to its TimerService. The timers are written to the snapshot so they can be recreated after a restart. Any timers that expire while the cluster is not running will fire when it next starts.
PendingMessageTracker¶
PendingMessageTracker {
nextServiceSessionId long
logServiceSessionId long
pendingMessageCapacity int
serviceId int
}
todo: complete once I've done PendingServiceMessageTracker
PendingServiceMessageTracker¶
todo: write up rough notes
Sweep Uncommitted Entries¶
todo: write up rough notes