Running Aeron Cluster

Aeron Cluster is composed of Agrona Agents, much like Aeron Transport and Aeron Archive. The Aeron Cluster agents are:

The ConsensusModuleAgent, which performs all the consensus work. It can run on its own thread (the default), or an invoker can be created for it, so you can invoke the ConsensusModuleAgent from your own thread
The ClusteredServiceAgent, which wraps the ClusteredService (your code) and always runs on its own thread

The Agrona Agents are shown below in red (this is per cluster node):

In addition to the above two agents, Aeron Cluster also requires the agents for Aeron Archive and Aeron Transport. There is a composite ClusteredMediaDriver that can be used to start:

a MediaDriver, which starts the 3 Aeron Transport agents (Receiver, Driver Conductor and Sender)
an Archive, which starts the 3 Aeron Archive agents (Archive Conductor, Recorder and Replayer)
a ConsensusModule, which starts the ConsensusModuleAgent

The ClusteredMediaDriver is not an agent / component in its own right. It is a factory for creating the other agents and for closing them all on shutdown.

The ClusteredServiceAgent needs creating separately, using a ClusteredServiceContainer.

As with Aeron Transport and Archive, there is a lot of flexibility in how everything can be run. You don't need to use a ClusteredMediaDriver - you can start each part of Aeron yourself. You can even run the ClusteredServiceContainer in its own process, separate from the ConsensusModule, as all the components talk via shared memory. Obviously they all need to run on the same machine.

Multiple Clustered Services¶

It is possible to run more than one Clustered Service on each cluster node. Each Clustered Service would read the same log messages, but could choose to ignore those that don't apply to it. You might want to do this in order to co-locate different services together (if that make sense for your business), or might want to run multiple instances of the same service as different shards that each process a different partition of the log (as arranged by themselves).

Each Clustered Service would be created by its own ClusteredServiceContainer, which would create a dedicated ClusteredServiceAgent for it. They would each run in their own thread, reading from the log at their own pace, so they can be out of sync in their processing.

As Clustered Services should be isolated from the outside world and only take input from the log, Clustered Services should not talk to each other via some back channel. If they need to talk to each other, it should be done by publishing a message onto an ingress channel, so it can be added to the log like all other inputs. Such messages would go to the 'back of the queue', behind any other messages already on the log.

Example¶

As an example of using a ClusteredMediaDriver, take a look at the end of the main() method in BasicAuctionClusteredServiceNode. It launches a ClusteredMediaDriver and passes in config (the Context) for the Media Driver, Archive and Consensus Module. Then it launches the ClusteredServiceAgent using a ClusteredServiceContainer. The ClusteredServiceContainer.Context contains an instance of your code that implements the ClusteredService interface.

ClusteredMediaDriver clusteredMediaDriver = ClusteredMediaDriver.launch(
    mediaDriverContext, archiveContext, consensusModuleContext);             // <1>
ClusteredServiceContainer container = ClusteredServiceContainer.launch(
    clusteredServiceContext))                                                // <2>

Developer note¶

You would typically run cluster nodes on dedicated machines for resilience, but a developer might want to run a 3-node cluster on their development machine. This requires running 3 of all of the above, and obviously things like port numbers and directories (shared memory and regular) need to be kept separate. See the BasicAuctionClusteredServiceNode and associated shell script for an example.