State and Snapshots

This page looks at the state held in the Consensus Module. More specifically, it looks at the state that is important enough that it needs to be written to a snapshot and reloaded at startup.

The majority of the state within Aeron Cluster is the application state in the Clustered Service. The Consensus Module also contains a small amount of state, which it writes to a separate snapshot whenever the Clustered Service is snapshotted.

The overall snapshotting process is described later in Snapshots.

We'll look at the state based on the contents of the snapshot.

Snapshots¶

The ConsensusModuleAgent takes snapshots using a ConsensusModuleSnapshotTaker and loads them using a ConsensusModuleSnapshotAdapter.

The outline of a snapshot is as follows. Each part is described below.

ConsensusModuleSnapshot {
    SnapshotMarker (START)
    ConsensusModule          // some of its state
    ClusterSession           // one for each cluster session
    Timer                    // one for each timer in the TimerService
    PendingMessageTracker    // one per ClusteredService
    SnapshotMarker (END)     // same content as the start marker
}

SnapshotMarker¶

This is used to mark the start and end of a snapshot. It is also used in Clustered Service snapshots.

SnapshotMarker {
    typeId            long              // 1 for the Consensus Module (2 for a Clustered Service)
    logPosition       long              // Log position at which the snapshot was taken
    leadershipTermId  long              // current leadership term when the snapshot was taken
    index             int               // currently unused (hardcoded to 0)
    mark              SnapshotMark      // enum: BEGIN, SECTION, END
    timeUnit          ClusterTimeUnit   // enum: MILLIS, MICROS, NANOS
    appVersion        int               // application version, default 0.0.1 encoded into an int
}

SnapshotMark BEGIN and END identifies the marker type. SECTION is currently unused.

appVersion is an application version number that your application can set in the ConsensusModule, should it wish to. It is a semantic version that defaults to 0.0.1.

ConsensusModule¶

ConsensusModule {
    nextSessionId           long        // clusterSessionId to assign to the next new ClusterSession
    nextServiceSessionId    long
    logServiceSessionId     long
    pendingMessageCapacity  int         // capacity of a container
}

todo: complete once I've done PendingServiceMessageTracker

ClusterSession¶

ClusterSession {
    clusterSessionId    long          // this session's id
    correlationId       long          // correlationId from the last Ingress message received
    openedLogPosition   long          // Log position after SessionOpenEvent was sent for the session
    timeOfLastActivity  long          // hardcoded to NULL_VALUE (-1)
    closeReason         CloseReason   // enum: CLIENT_ACTION, SERVICE_ACTION, TIMEOUT
    responseStreamId    int           // for the session's Egress
    responseChannel     String        // for the session's Egress
}

This is for each ClusterSession created for a client connection, but only those that are OPEN or CLOSING. Temporary ClusterSessions created by tools like ClusterTool are not added to the snapshot (they are not in the sessions field, which is used for the snapshot). closeReason is set to CloseReason.NULL_VAL if state is OPEN, which is used to determine the state when the snapshot is loaded.

Why store these though? If a cluster is running and an Election happens, the new leader re-establishes client connections, as Elections are usually fast and clients are likely to be waiting (default session timeout is 10 seconds). But snapshots are only loaded when the cluster starts, and client connections are not re-established at startup, they are actively closed at the end of the first Election.

The reason is that the Clustered Services need to be notified that the ClusteredSessions have closed, otherwise they won't know. They take all input from the Log, so they need a Log message. After loading the snapshot and completing an Election, the leader ConsensusModuleAgent checks for session timeouts as part of its normal duty cycle. It finds the CLOSING ClusterSessions as inactive, because timeOfLastActivity was set to -1 in the snapshot, then for each of them, publishes a SessionCloseEvent to the Log, before discarding the session.

Timer¶

Timer {
    correlationId  long     // timer id, provided by the application
    deadline       long     // expiry time (epoch time in cluster time units)
}

When the Clustered Service creates a Timer, it sends a ScheduleTimer message to its Consensus Module, which adds the timer to its TimerService. The timers are written to the snapshot so they can be recreated after a restart. Any timers that expire while the cluster is not running will fire when it next starts.

PendingMessageTracker¶

PendingMessageTracker {
    nextServiceSessionId    long
    logServiceSessionId     long
    pendingMessageCapacity  int
    serviceId               int
}

todo: complete once I've done PendingServiceMessageTracker

PendingServiceMessageTracker¶

todo: write up rough notes

Sweep Uncommitted Entries¶

todo: write up rough notes

sweepUncommittedEntriesTo()