State and Snapshots

This page looks at the state held in the Clustered Service. More specifically, it looks at the state that is important enough that it needs to be written to a snapshot and reloaded at startup.

The overall snapshotting process is described later in Snapshots.

We'll look at the state based on the contents of the snapshot.

Snapshots¶

The ClusteredServiceAgent takes snapshots using a ServiceSnapshotTaker and loads them using a ServiceSnapshotLoader. Snapshots start with state from the ClusteredServiceAgent, followed by application state from the ClusteredService.

The outline of a snapshot is as follows. Each part is described below.

ClusteredServiceSnapshot {
    SnapshotMarker (START)
    ClientSession            // one for each client session
    SnapshotMarker (END)     // same content as the start marker
    ClusteredService         // application state
}

SnapshotMarker¶

This is used to mark the start and end of a snapshot. It is also used in Consensus Module snapshots.

SnapshotMarker {
    typeId            long              // 2 for a ClusteredService (1 for the Consensus Module)
    logPosition       long              // Log position at which the snapshot was taken
    leadershipTermId  long              // current leadership term when the snapshot was taken
    index             int               // currently unused (hardcoded to 0)
    mark              SnapshotMark      // enum: BEGIN, SECTION, END
    timeUnit          ClusterTimeUnit   // enum: MILLIS, MICROS, NANOS
    appVersion        int               // application version, default 0.0.1 encoded into an int
}

SnapshotMark BEGIN and END identifies the marker type. SECTION is currently unused.

appVersion is an application version number that your application can set in the ClusteredServiceContainer, should it wish to. It is a semantic version that defaults to 0.0.1.

ClientSession¶

ClientSession {
    clusterSessionId  long      // its session id
    responseStreamId  int       // Egress streamId
    responseChannel   String    // Egress channel
    encodedPrincipal  byte[]    // from the SessionOpenEvent
}

Unlike the Consensus Module, the Clustered Service writes all of its ClientSessions to the snapshot.

ClusteredService¶

When taking a snapshot, the ClusteredServiceAgent writes its own state to a snapshotPublication, then asks the ClusteredService to continue, by writing application state in onTakeSnapshot().

    void onTakeSnapshot(ExclusivePublication snapshotPublication);

The opposite happens when loading the snapshot. This happens once, when the Cluster member starts. If there is a snapshot, the ClusteredServiceAgent starts by loading its state. When it calls onStart(), it passes the snapshotImage at the correct position for it to continue by loading its application state.

    void onStart(Cluster cluster, Image snapshotImage);

ClusteredService snapshot format¶

The ClusteredService is free to use whatever format it likes for snapshotting its state. It can use generated codecs like Aeron Cluster, or write a huge JSON string. Aeron Cluster doesn't care.

As a recommendation, what I have seen work best is to separate the application state from its snapshot representation. For example, the application may have a map of Accounts, keyed by accountId. If an Account contains its own accountId, there is no need to represent the map in the snapshot. Accounts can be written out in a list and the map can be rebuilt when the snapshot is loaded.

It is useful to have an explicit model for the snapshot representation, ideally something generated from a schema, like SBE. Then you can represent different versions of the objects in the snapshot. You might support versions 3 and 4 of an Account, so you can load an Account with version 3 and write it out to the next snapshot as version 4. Older versions can be deleted when they are no longer used.