Skip to content

Network Publications - Receiving Messages

Subscription and Image Counters

On the Cardinalities page, we saw that there can be many Images and Subscriptions for the same Channel and StreamId. This leads to a potential many-to-many relationship between Subscriptions and Images.

Each Image has two counters: rcv-pos and rcv-hwm. They are related to the Receiver's status in rebuilding the Image, so they are specific to the Image, not any of the Subscriptions linked to it. The counters are shown in the diagram next to what they relate to, but they're actually stored in the cnc.dat file.

A Subscription does not have any counters of its own per se. It has a sub-pos counter for each Image that it is linked to, so each Subscription can be at a different position in a given Image.

images/47.logbuffer rcv-pos, rcv-hwm images/15.logbuffer rcv-pos, rcv-hwm Subscription 6 Subscription 5 Receiver sub-pos sub-pos sub-pos sub-pos

Initial State

Let's look at how messages are received from the network and read by a Subscription, but let's focus on one Image and one Subscription. More than one Subscription is just multiple readers of the same Image. More than one Image is just the Subscription polling multiple Images (doing more of the same).

So, given a Subscription that has one Image log buffer associated with it, the counters in cnc.dat and the Image log buffer look like this:

rcv-hwm rcv-pos sub-pos Term 2 Term 1 Term 0 images/51.logbuffer cnc.dat: rcv-pos, rcv-hwm sub-pos

The Terms are empty (zeroed).

There is some metadata in the Image, but it isn't really used by the Subscriptions. The metadata is not sent from the sender's Publication log buffer in the same way that the Term data is. It is initialised with static data like the StreamId and SessionId. The tail counters are not used in the Image, nor is the activeTermCount.

  • rcv-pos is the highest contiguous position the Receiver has reached when rebuilding the Image, i.e. without any gaps. This is the position that Frames can be read up to.
  • rcv-hwm is the highest absolute position the Receiver has reached, including gaps. This is literally a high watermark and there may be missing frames below this position.
  • sub-pos is the position up to which the Subscription has returned fragments to the application from this Image.

Significant Events

There are five significant events that can affect the Subscription and this particular Image:

  • the Receiver receives a packet starting with a DATA, PAD or RTTM frame and gives it to the Image
  • the application polls the Subscription
  • the Driver Conductor looks for gaps, updates rcv-pos, cleans the log buffer and calculates the next SM values
  • the Receiver sends a NAK for a gap detected by the Driver Conductor
  • the Receiver sends a Status Message back to the Sender, using values calculated by the Driver Conductor

The Receiver can also receive a SETUP message from a new Publication, which would cause a new Image to be created, but that's out of scope because we said we're focussing on a Subscription with one Image.

If the application polled the Subscription at this stage, the FragmentHandler that it passed as a parameter would not be invoked and the poll() method would return zero, which is the number of fragments returned. poll() never returns an error code (unlike trying to publish a message), even if poll() was called before the Subscription was linked to any Images.

Receiver receives a DATA frame

When the Receiver polls the ReceiveChannelEndpoint and receives a packet that starts with a DATA frame, it uses the StreamId and SessionId in the Frame header to find the correct Image. It then passes the packet to the Image.

Let's say the packet contains two DATA frames - the first two from the Sending example. The Image reads the TermId and TermOffset from the Frame header in the first DATA frame in the packet, then copies the whole packet to that position. This is safe because when a packet contains multiple Frames, they will always come from contiguous Frames in the sender's Publication log buffer. Frames after the first one essentially get a free ride and require no additional processing.

The fact that each packet describes its own position in the log buffer means the Receiver can efficiently handle receiving packets out of sequence. It just copies each packet to the position indicated, regardless of which order they are received in. There is no need to copy them to a holding area, waiting for missing frames to arrive.

The rcv-hwm counter is incremented, but rcv-pos is not. The Receiver is on the hot path and needs to get back to receiving packets and copying them to the correct location.

PAD frames are handled in the same way as DATA frames. PAD frames will always be at the end of a packet and only the PAD Frame header will be present - the payload of zeros is not transmitted.

The Receiver also checks each Image to see whether the Driver Conductor has requested that a NAK is sent to the Sender. The Driver Conductor does this when it detects a gap in the Image. If so, the Receiver sends the NAK. The Sender should retransmit the missing Frame(s), which the Receiver will process in the same way as any other - inserting them into the Image in the correct position.

Finally, the Receiver checks each Image to see if the Driver Conductor has requested that a Status Message should be sent for it. If so, the Receiver uses the information calculated by the Driver Conductor and sends the SM.

rcv-hwm rcv-pos sub-pos Term 2 Term 1 Term 0 images/51.logbuffer cnc.dat: rcv-pos, sub-pos rcv-hwm

Application polls the Subscription

When the application polls the Subscription, the Subscription polls each of its Images (in a non-predictable order). The Subscription has a sub-pos counter for each Image, which is the position it has read up to in that Image.

The Subscription reads the Image at the sub-pos position. If there is a Frame there, it will start with a Frame header, and the first 32 bits are the Frame length. A positive 32-bit number at sub-pos indicates the presence of a Frame. The Frame could be a DATA frame or a PAD frame. If there is no message there, there will be zeros at sub-pos, because the Terms start out zeroed and old Frames are cleaned before a Term is reused.

If it is a DATA frame, it will contain a fragment. The fragment is passed back to the application via the FragmentHandler that the application passed in. If it is a PAD frame, it is ignored. When the application polls the Subscription, it also passes in a fragmentLimit, which is the maximum number of fragments to return. Let's say it set fragmentLimit to 1, so at this point, the Subscription stops reading the Image. The last thing it does is update the sub-pos counter to the position it read up to.

rcv-hwm rcv-pos sub-pos Term 2 Term 1 Term 0 images/51.logbuffer cnc.dat: rcv-pos, rcv-hwm sub-pos

Driver Conductor runs

When the Receiver writes to the Image, it updates rcv-hwm to the highest position written. One of the Driver Conductor's jobs is to check the rebuild status of each Image (check for gaps). This means walking rcv-pos forward from one Frame to the next, until it reaches rcv-hwm or a gap caused by one or more missing Frames.

A Subscription might already have done some of this work, as they walk forward through the Frames in the same way and update their sub-pos counter. The Driver Conductor takes advantage of this by starting its walk from max(sub-pos), i.e. if there is more than one Subscription reading the Image, it takes the highest sub-pos across them. The Driver Conductor walks forward from there, in the same way that the Subscription did, but going as far as it can. It sets rcv-pos to the position it reaches.

If rcv-pos does not match rcv-hwm, then it found a gap. It would remember the gap, along with a deadline time. If the gap is still present when the deadline passes, the Driver Conductor would copy information about the gap into the Image and would make an entry in loss-report.dat. It's the Receiver's responsibility to check each Image for gap information and send a NAK message.

In our example, there are no gaps, so rcv-pos is updated to rcv-hwm.

Finally, the Driver Conductor checks if the slowest Subscription min(sub-pos) has advanced beyond some threshold. If it has, it does two things:

  • it cleans the log buffer to min(sub-pos) - termLength (so it retains a Term length's worth of data behind the slowest Subscription).
  • it schedules the next Status Message, which the Receiver will pick up and send. To do this, it stores min(sub-pos) and the windowLength in the Image for the Receiver to use.

rcv-hwm rcv-pos sub-pos Term 2 Term 1 Term 0 images/51.logbuffer cnc.dat: rcv-hwm sub-pos rcv-pos,

And so on

It really is just more of the same.

Join Position

When an Image becomes available for a Subscription, the Subscription's sub-pos counter for that Image needs setting to some initial value. This position is the join position - the position the Subscription joined the Image at, where it will start reading messages from. The join position is set to the lowest position that is 'safe' for the Subscription to start at.

For the first Subscription to a new Publication, the join position will be zero. Zero is a safe position because the Publication will not allow so many messages to be written to the Log Buffer that it wraps and overwrites the first messages before the Subscription has consumed them.

But a new Subscription could be created for a Publication that has already been publishing to other Subscriptions for a while. The Log Buffer may have rotated and reused the first Term buffer, overwriting the first messages at position zero. In this scenario, the new Subscription joins at the position of the slowest consumer (Subscriber or Sender). This position is 'safe' in the sense that the Driver Conductor will not clean the Log Buffer beyond the slowest consumer's position and this adds another consumer at that position. Setting the join position in this way gives the Subscriber the largest amount of history prior to the Publication position.