State and Snapshots
This page looks at the state held in the Clustered Service. More specifically, it looks at the state that is important enough that it needs to be written to a snapshot and reloaded at startup.
The overall snapshotting process is described later in Snapshots.
We'll look at the state based on the contents of the snapshot.
Snapshots¶
The ClusteredServiceAgent takes snapshots using a ServiceSnapshotTaker and loads them using a ServiceSnapshotLoader. Snapshots start with state from the ClusteredServiceAgent, followed by application state from the ClusteredService.
The outline of a snapshot is as follows. Each part is described below.
ClusteredServiceSnapshot {
SnapshotMarker (START)
ClientSession // one for each client session
SnapshotMarker (END) // same content as the start marker
ClusteredService // application state
}
SnapshotMarker¶
This is used to mark the start and end of a snapshot. It is also used in Consensus Module snapshots.
SnapshotMarker {
typeId long // 2 for a ClusteredService (1 for the Consensus Module)
logPosition long // Log position at which the snapshot was taken
leadershipTermId long // current leadership term when the snapshot was taken
index int // currently unused (hardcoded to 0)
mark SnapshotMark // enum: BEGIN, SECTION, END
timeUnit ClusterTimeUnit // enum: MILLIS, MICROS, NANOS
appVersion int // application version, default 0.0.1 encoded into an int
}
SnapshotMark BEGIN and END identifies the marker type. SECTION is currently unused.
appVersion
is an application version number that your application can set in the ClusteredServiceContainer,
should it wish to. It is a semantic version that defaults to 0.0.1.
ClientSession¶
ClientSession {
clusterSessionId long // its session id
responseStreamId int // Egress streamId
responseChannel String // Egress channel
encodedPrincipal byte[] // from the SessionOpenEvent
}
Unlike the Consensus Module, the Clustered Service writes all of its ClientSessions to the snapshot.
ClusteredService¶
When taking a snapshot, the ClusteredServiceAgent writes its own state to a snapshotPublication
, then asks the
ClusteredService to continue, by writing application state in onTakeSnapshot()
.
void onTakeSnapshot(ExclusivePublication snapshotPublication);
The opposite happens when loading the snapshot. This happens once, when the Cluster member starts. If there is a
snapshot, the ClusteredServiceAgent starts by loading its state. When it calls onStart()
, it passes the
snapshotImage
at the correct position for it to continue by loading its application state.
void onStart(Cluster cluster, Image snapshotImage);
ClusteredService snapshot format¶
The ClusteredService is free to use whatever format it likes for snapshotting its state. It can use generated codecs like Aeron Cluster, or write a huge JSON string. Aeron Cluster doesn't care.
As a recommendation, what I have seen work best is to separate the application state from its snapshot representation. For example, the application may have a map of Accounts, keyed by accountId. If an Account contains its own accountId, there is no need to represent the map in the snapshot. Accounts can be written out in a list and the map can be rebuilt when the snapshot is loaded.
It is useful to have an explicit model for the snapshot representation, ideally something generated from a schema, like SBE. Then you can represent different versions of the objects in the snapshot. You might support versions 3 and 4 of an Account, so you can load an Account with version 3 and write it out to the next snapshot as version 4. Older versions can be deleted when they are no longer used.