Network Publications - Sending Messages
Initial State
A newly created Publication will have some counters in cnc.dat and a Publication log buffer, which looks like this:
The Terms are empty (zeroed).
In the metadata section, isConnected
is false, activeTermCount
is 0, and initialTermId
is set to a random number.
The metadata contains the tail counters TC0, TC1 and TC2, but they're also shown in the diagram next to where they
point to. Each tail counter has the TermId in the top 32 bits and the TermOffset in the low 32 bits, so TC0 will
start at 0x3ed00000000 (TermId 1005, TermOffset 0).
Publishing-related Counters
pub-pos
is the publisher's position in the Publication log buffer, i.e. after the last Frame written to the log buffer. This is in absolute number of bytes (starting at zero). It is recorded in the cnc.dat file for info, but is not used as input into anything. It is easier to see it with all the other counters than to have to dig around in the log buffer to find the current tail counter.pub-lmt
is the limit (max position) that the publisher can write to. If publishing a new message would takepub-pos
beyondpub-lmt
, the message would not be added and BACK_PRESSURED returned instead.
Sending-related Counters
snd-pos
is the Sender's position, which is the position the Sender has sent up to (in absolute number of bytes).snd-lmt
is the limit that the Sender can send up to, which is controlled by the space available on the receiving side. This is updated based on information in SMs. If sending more data would takesnd-pos
beyondsnd-lmt
, the Sender would not send, to avoid overwhelming the Receiver, but would incrementsnd-bpe
instead.snd-bpe
(Sender back-pressure events) is a count of how many times the Sender couldn't send because there wasn't enough space on the Receiver. It is recorded for monitoring only - it doesn't input into anything. Wheneversnd-bpe
is incremented for a Publication, the system-wideSENDER_FLOW_CONTROL_LIMITS
counter is also incremented - this provides a single counter that can be checked before digging into Publication-specific counters.
Significant Events
There are four significant events that can affect the log buffer:
- the application attempts to publish a message
- the Sender attempts to send some data
- the Sender receives a Status Message and updates some counters
- the Driver Conductor updates some counters and / or cleans part of the log buffer
The application, Sender and Driver Conductor usually run on different threads, so these could occur in any order, or even at the same time.
If the application attempted to publish a message at this stage, it would fail because pub-pos
is at pub-lmt
. It
would be returned NOT_CONNECTED because isConnected
is false. If it was connected, BACK_PRESSURED would have been
returned.
The only thing that can affect the log buffer at this stage is receiving an SM.
Sender Receives a Status Message
Most data travels from the Sender to the Receiver. The Receiver occasionally responds with a Status Message (SM), which provides flow control and back-pressure. An SM is not sent to ack each individual packet. An SM is also sent in response to a SETUP message, to establish the connection.
A Status Message contains the minimum Subscriber position (slowest), which I'll refer to as min(sub-pos)
. It
also contains receiverWindowLength
, which, unless overridden, is
Configuration.INITIAL_WINDOW_LENGTH_DEFAULT
(128KB),
or half a term length, whichever is smallest.
When the Sender receives an SM, it sets isConnected
to true. It puts min(sub-pos)
and receiverWindowLength
into a flow control algorithm (e.g. UnicastFlowControl) and sets snd-lmt
to the result.
In other words, if a Subscription on the receiving side falls behind, it affects how much the Sender can send.
We're not tracking Subscriber counters on this diagram as they live on the
receiving side, but min(sub-pos)
will be zero at this point, because we haven't sent any messages yet.
The Publication is now connected. The SM has updated snd-lmt
, so the Sender has some space to send, but the
application still can't publish messages yet, because pub-lmt
is still zero. If it tried, it would now be returned
BACK_PRESSURED because isConnected
is true. The Driver Conductor needs to run, to update pub-lmt
.
Driver Conductor updates Publisher Counters
When the Driver Conductor runs, it asks each NetworkPublication to update its pub-pos
and pub-lmt
counters.
It sets pub-pos
to the current tail counter TC0
. This would make more sense if the application had published
some messages, because pub-pos
is already the same value as TC0
. The application advances the tail counters as it
publishes messages and the Driver Conductor updates pub-pos
to the latest tail counter in the background.
Now that isConnected
is true, the NetworkPublication also sets pub-lmt
to snd-pos + termWindowLength
(half a term). As the Sender's position advances, so does the publisher's limit. Now the application can publish
messages from pub-pos
to pub-lmt
and the Sender can send messages from snd-pos
to snd-lmt
.
The Driver Conductor also cleans old messages, but we'll describe that later, once there are some.
Application Publishes Messages
When the application calls into the Publication to publish a message, the Publication bumps the tail counter to reserve space for the message, before writing it. If the client was using a ConcurrentPublication, there could be multiple threads writing to the log buffer, so they bump the tail counter using a CAS operation. The tail counter is used to coordinate multiple writers.
Once the tail position has been advanced, the space is reserved and another thread can advance the tail position again to make room for another message. The actual writing of the messages happens next and can occur in parallel, so Publishers don't wait for each other to write messages; they just wait for the CAS operation.
Once the Publication has bumped the tail counter by the length of the message (taking into account fragmentation, the DATA frame header, etc.), it writes each fragment in turn. For each fragment, it writes the frame header with a negative frame length, then the fragment body, then rewrites the length in the header with the positive frame length.
The consumer of the log buffer (in this case, the Sender) looks for a positive frame length as the signal that a frame
is completely written and is ready to be read. The consumer cannot use the tail counter (or pub-pos
), as that
indicates how much space has been reserved, not whether the messages have been written yet.
The format of messages written into the log buffer is described in more detail on the Log Buffers page.
Blocked messages, offer() and tryClaim()
Why does the fragment length get written as a negative value first? Why not use zero?
The Publication API classes provide two methods that can be used by an application to publish a message:
offer()
- the application writes a message to a buffer, then passes the buffer in tooffer()
, which copies it into the log buffer, requiring a copy operationtryClaim()
- the application passes the message length intotryClaim()
, which reserves that amount of space in the log buffer, writes a Frame header with the negative length, then returns aBufferClaim
. TheBufferClaim
wraps the area in the log buffer where the message needs writing. The application writes the message into the log buffer via theBufferClaim
, then calls acommit()
method, which rewrites the Frame length to be positive. This provides zero-copy semantics.
The risk with tryClaim()
is that the application could fail to write the message and not call commit()
, leaving
it incomplete. This is known as a blocked message. If a message has been blocked for over 15 seconds, the Driver
Conductor attempts to unblock it by replacing it with a PAD frame of the same length. This is what the negative length
is used for. PAD frames are ignored by Subscriptions. If a message is unblocked, the UNBLOCKED_PUBLICATIONS
system
counter is incremented.
It is unlikely that offer()
would fail in the same way, as the application writes the message to a temporary buffer
before calling offer()
. All offer()
has to do is copy the bytes into the log buffer. However, offer()
still
writes the negative length in the header first, so both methods operate in the same way.
Note that with offer()
, the Publication can fragment long messages. With tryClaim()
, the onus is on the application
not to write messages longer than MTU - Frame header length
.
Sender Sends Data
When the Sender runs, it looks for Frames starting at snd-pos
. If there is a Frame there, it will start
with a Frame header and the first 32 bits will be the Frame length. The Sender reads the 32 bits at snd-pos
and if
there is a positive Frame length present, it skips forward that number of bytes (after aligning it to the 32 byte
alignment) and repeats the process, like following a linked list of Frame lengths.
The Sender stops when it reaches a limit, which is either no more Frames (zero or negative Frame length), the end
of the term, the sender limit (snd-lmt
) or the max MTU length. If it found some Frames, it sends them in a single
UDP packet, then updates snd-pos
. Let's say it sent 2 of the 3 messages, because they wouldn't all fit in the MTU.
The Sender always sends whole Frames, which means every packet will start with a Frame header. The Frame header acts as a network header - the Receiver can read the Frame header to find out which Image the packet is for and where to insert it in the Image.
If the Sender fails to send the complete packet, is increments a SHORT_SENDS
system counter.
If there are no Frames to send, the Sender can send a heartbeat message so the Receiver knows it's still there.
Driver Conductor updates Publisher Counters
This is the same as before, but the Publication has written 3 messages and the Sender has sent 2 messages since the
Driver Conductor last ran, so TC0
and snd-pos
have advanced. The Driver Conductor sets pub-pos
to TC0
, and
sets pub-lmt
to snd-pos + termWindowLength
.
Sender receives another SM
At some point, the Receiver will send another SM containing a new min(sub-pos)
and receiverWindowLength
.
The Sender uses it to advance snd-lmt
again. Let's say there was a Subscriber that had only read one of the
two messages. The slow Subscriber causes snd-lmt
to not advance by as much, restricting how much can be sent.
And so on
The four significant events mentioned earlier continue to happen, and it's essentially just more of the same. The only action that hasn't been covered is cleaning (zeroing) of old messages in the log buffer.
Cleaning
The Driver Conductor cleans old messages in the buffer by overwriting them with zeros. This prepares the buffer for reuse when the Terms rotate and this Term is used again.
Network Publications clean up to snd-pos - termBufferLength
. There is no need to clean up to a well-defined
position like the start of a Frame, but this does mean if you ever have to look at a log buffer, the first non-zero
bytes are unlikely to be a Frame header. Cleaning up to termBufferLength
behind snd-pos
allows retransmits of
data up to a whole Term length in the past.
The Terms are not cleaned as per the "wear one, wash one, dry one" saying. That may have been the case once, but they are now cleaned with smaller, more frequent cleans. If you ever have to debug an issue and look at a log buffer file, this explains why a lot of the data will be zeros.