Network Publications - Receiving Messages
Subscription and Image Counters
On the Cardinalities page, we saw that there can be many Images and Subscriptions for the same Channel and StreamId. This leads to a potential many-to-many relationship between Subscriptions and Images.
Each Image has two counters: rcv-pos
and rcv-hwm
. They are related to the Receiver's status in rebuilding the
Image, so they are specific to the Image, not any of the Subscriptions linked to it. The counters are shown in
the diagram next to what they relate to, but they're actually stored in the cnc.dat file.
A Subscription does not have any counters of its own per se. It has a sub-pos
counter for each Image that it is
linked to, so each Subscription can be at a different position in a given Image.
Initial State
Let's look at how messages are received from the network and read by a Subscription, but let's focus on one Image and one Subscription. More than one Subscription is just multiple readers of the same Image. More than one Image is just the Subscription polling multiple Images (doing more of the same).
So, given a Subscription that has one Image log buffer associated with it, the counters in cnc.dat and the Image log buffer look like this:
The Terms are empty (zeroed).
There is some metadata in the Image, but it isn't really used by the Subscriptions. The metadata is not sent from the sender's Publication log buffer in the same way that the Term data is. It is initialised with static data like the StreamId and SessionId. The tail counters are not used in the Image, nor is the activeTermCount.
Receiving-related Counters
rcv-pos
is the highest contiguous position the Receiver has reached when rebuilding the Image, i.e. without any gaps. This is the position that Frames can be read up to.rcv-hwm
is the highest absolute position the Receiver has reached, including gaps. This is literally a high watermark and there may be missing frames below this position.
Subscription-related Counters
sub-pos
is the position up to which the Subscription has returned fragments to the application from this Image.
Significant Events
There are five significant events that can affect the Subscription and this particular Image:
- the Receiver receives a packet starting with a DATA, PAD or RTTM frame and gives it to the Image
- the application polls the Subscription
- the Driver Conductor looks for gaps, updates
rcv-pos
, cleans the log buffer and calculates the next SM values - the Receiver sends a NAK for a gap detected by the Driver Conductor
- the Receiver sends a Status Message back to the Sender, using values calculated by the Driver Conductor
The Receiver can also receive a SETUP message from a new Publication, which would cause a new Image to be created, but that's out of scope because we said we're focussing on a Subscription with one Image.
If the application polled the Subscription at this stage, the FragmentHandler that it passed as a parameter would not
be invoked and the poll()
method would return zero, which is the number of fragments returned. poll()
never
returns an error code (unlike trying to publish a message), even if poll()
was called before the Subscription was
linked to any Images.
Receiver receives a DATA frame
When the Receiver polls the ReceiveChannelEndpoint and receives a packet that starts with a DATA frame, it uses the StreamId and SessionId in the Frame header to find the correct Image. It then passes the packet to the Image.
Let's say the packet contains two DATA frames - the first two from the Sending example. The Image reads the TermId and TermOffset from the Frame header in the first DATA frame in the packet, then copies the whole packet to that position. This is safe because when a packet contains multiple Frames, they will always come from contiguous Frames in the sender's Publication log buffer. Frames after the first one essentially get a free ride and require no additional processing.
The fact that each packet describes its own position in the log buffer means the Receiver can efficiently handle receiving packets out of sequence. It just copies each packet to the position indicated, regardless of which order they are received in. There is no need to copy them to a holding area, waiting for missing frames to arrive.
The rcv-hwm
counter is incremented, but rcv-pos
is not. The Receiver is on the hot path and needs to get back to
receiving packets and copying them to the correct location.
PAD frames are handled in the same way as DATA frames. PAD frames will always be at the end of a packet and only the PAD Frame header will be present - the payload of zeros is not transmitted.
The Receiver also checks each Image to see whether the Driver Conductor has requested that a NAK is sent to the Sender. The Driver Conductor does this when it detects a gap in the Image. If so, the Receiver sends the NAK. The Sender should retransmit the missing Frame(s), which the Receiver will process in the same way as any other - inserting them into the Image in the correct position.
Finally, the Receiver checks each Image to see if the Driver Conductor has requested that a Status Message should be sent for it. If so, the Receiver uses the information calculated by the Driver Conductor and sends the SM.
Application polls the Subscription
When the application polls the Subscription, the Subscription polls each of its Images (in a non-predictable order).
The Subscription has a sub-pos
counter for each Image, which is the position it has read up to in that Image.
The Subscription reads the Image at the sub-pos
position. If there is a Frame there, it will start with a Frame
header, and the first 32 bits are the Frame length. A positive 32-bit number at sub-pos
indicates the presence of
a Frame. The Frame could be a DATA frame or a PAD frame. If there is no message there, there will be zeros at
sub-pos
, because the Terms start out zeroed and old Frames are cleaned before a Term is reused.
If it is a DATA frame, it will contain a fragment. The fragment is passed back to the application via the
FragmentHandler that the application passed in. If it is a PAD frame, it is ignored. When the application polls
the Subscription, it also passes in a fragmentLimit, which is the maximum number of fragments to return. Let's say
it set fragmentLimit to 1, so at this point, the Subscription stops reading the Image. The last thing it does is
update the sub-pos
counter to the position it read up to.
Driver Conductor runs
When the Receiver writes to the Image, it updates rcv-hwm
to the highest position written. One of the Driver
Conductor's jobs is to check the rebuild status of each Image (check for gaps). This means walking rcv-pos
forward
from one Frame to the next, until it reaches rcv-hwm
or a gap caused by one or more missing Frames.
A Subscription might already have done some of this work, as they walk forward through the Frames in the same way and
update their sub-pos
counter. The Driver Conductor takes advantage of this by starting its walk from max(sub-pos)
,
i.e. if there is more than one Subscription reading the Image, it takes the highest sub-pos
across them. The Driver
Conductor walks forward from there, in the same way that the Subscription did, but going as far as it can. It sets
rcv-pos
to the position it reaches.
If rcv-pos
does not match rcv-hwm
, then it found a gap. It would remember the gap, along with a deadline time.
If the gap is still present when the deadline passes, the Driver Conductor would copy information about the gap into
the Image and would make an entry in loss-report.dat. It's the Receiver's responsibility to check each Image for gap
information and send a NAK message.
In our example, there are no gaps, so rcv-pos
is updated to rcv-hwm
.
Finally, the Driver Conductor checks if the slowest Subscription min(sub-pos)
has advanced beyond some threshold.
If it has, it does two things:
- it cleans the log buffer to
min(sub-pos) - termLength
(so it retains a Term length's worth of data behind the slowest Subscription). - it schedules the next Status Message, which the Receiver will pick up and send. To do this, it stores
min(sub-pos)
and thewindowLength
in the Image for the Receiver to use.
And so on
It really is just more of the same.
Join Position
When an Image becomes available for a Subscription, the Subscription's sub-pos
counter for that Image needs setting
to some initial value. This position is the join position - the position the Subscription joined the Image at,
where it will start reading messages from. The join position is set to the lowest position that is 'safe' for the
Subscription to start at.
For the first Subscription to a new Publication, the join position will be zero. Zero is a safe position because the Publication will not allow so many messages to be written to the Log Buffer that it wraps and overwrites the first messages before the Subscription has consumed them.
But a new Subscription could be created for a Publication that has already been publishing to other Subscriptions for a while. The Log Buffer may have rotated and reused the first Term buffer, overwriting the first messages at position zero. In this scenario, the new Subscription joins at the position of the slowest consumer (Subscriber or Sender). This position is 'safe' in the sense that the Driver Conductor will not clean the Log Buffer beyond the slowest consumer's position and this adds another consumer at that position. Setting the join position in this way gives the Subscriber the largest amount of history prior to the Publication position.