Add a chapter on auto-reconnect and retry mechanism

Summary: related to T12804

Reviewers: ivica

Reviewed By: ivica

Subscribers: miljen, iljazovic

Differential Revision: https://repo.mireo.local/D28198
This commit is contained in:
Korina Šimičević
2024-02-29 13:03:11 +01:00
parent a1249b433d
commit f34705d74e
4 changed files with 101 additions and 2 deletions

View File

@ -111,7 +111,8 @@
[include 01_intro.qbk]
[include 02_configuring_the_client.qbk]
[include 03_examples.qbk]
[include 03_auto_reconnect.qbk]
[include 04_examples.qbk]
[include examples/Examples.qbk]

View File

@ -44,7 +44,8 @@ In the event of a connection failure with one Broker, the Client switches to the
* [*Offline buffering]: While offline, it automatically buffers all the packets to send when the connection is re-established.
[heading Example]
The following example illustrates a simple scenario of configuring the Client and publishing an Application Message.
The following example illustrates a simple scenario of configuring a Client and publishing a
"Hello World!" Application Message with `QoS` 0.
[!c++]
#include <iostream>

View File

@ -0,0 +1,97 @@
[section:auto_reconnect Built-in auto-reconnect and retry mechanism]
[nochunk]
The auto-reconnect and retry mechanism is a key feature of the __Self__ library.
It is designed to internally manage the complexities of disconnects, backoffs, reconnections, and message retransmissions.
These tasks, if handled manually, could lead to extended development times, difficult testing, and compromised reliability.
By automating these processes, the __Self__ library enables users of the __Client__ to focus primarily
on the application's functionality without worrying about the repercussions of lost network connectivity.
You can call any asynchronous function within the __Client__ regardless of its current connection status.
If the __Client__ is offline, it will queue all outgoing requests and transmit them as soon as the connection is restored.
In situations where the connection is unexpectedly lost mid-protocol flow,
the __Client__ complies with the MQTT protocol's specified message delivery retry mechanisms.
The following example will showcase how the __Client__ internally manages various scenarios, including successful transmission, offline queuing,
and connection loss retransmissions, when executing a request to publish a message with QoS 1.
Note that the same principle applies to all other asynchronous functions within the __Client__
(see /Completion condition/ under [refmem mqtt_client async_publish], [refmem mqtt_client async_subscribe], [refmem mqtt_client async_unsubscribe],
and [refmem mqtt_client async_disconnect]).
// Publishing with QoS 1 involves a two-step process: sending a PUBLISH message to the Broker and awaiting a PUBACK (acknowledgement) response.
// The scenarios that might unfold include:
// a) The Client sends the PUBLISH message immediately.
// b) If the Client is offline when attempting to publish, it queues the PUBLISH message and sends it
// as soon as the connection is re-established.
// c) Should the Client lose connection after sending the PUBLISH message but before receiving a PUBACK,
// it will automatically retransmit the PUBLISH message once connectivity is restored.
client.async_publish<async_mqtt5::qos_e::at_least_once>(
"my-topic", "Hello world!",
async_mqtt5::retain_e::no, async_mqtt5::publish_props {},
[](async_mqtt5::error_code ec, async_mqtt5::reason_code rc, async_mqtt5::puback_props props) {
// This callback is invoked under any of the following circumstances:
// a) The Client successfully sends the PUBLISH packet and receives a PUBACK from the Broker.
// b) The Client encounters a non-recoverable error, such as a cancellation or providing invalid parameters
// to async_publish, which prevents the message from being sent.
}
);
[section:sentry ]
[endsect] [/sentry]
[section:cons Considerations and limitations]
The integrated auto-reconnect and retry mechanism greatly improves the user experience
by simplifying complex processes and ensuring continuous connections.
However, it is important to be mindful of certain limitations and considerations associated with this feature.
[heading Delayed handler invocation]
During extended periods of __Client__ downtime, the completion handlers for asynchronous functions,
such as those used in [refmem mqtt_client async_publish], may face considerable delays before invocation.
This can result in users being left in the dark regarding the status of their requests due to the absence of prompt feedback on the initiated actions.
[heading Concealing configuration-related issues]
The __Client__ will always try to reconnect to the Broker(s) regardless of the reason why the connection was previously closed.
This is desirable behaviour when the connection gets dropped due to underlying stream transport issues,
such as when a device connected to the network loses its GSM connectivity.
However, the connection may be closed (or never established) for other reasons,
which are typically related to misconfiguration of broker IPs, ports, expired or incorrect TLS, or MQTT-related errors,
such as trying to communicate with a Broker that does not support MQTT 5.
In these cases, the __Client__ will still endlessly try to connect to Broker(s), but the connection will never succeed.
The most challenging problem here is that users of the __Client__ do not get informed in any way that the connection cannot be established.
So, if you make a typo in the Broker's IP, run the __Client__, and publish some message, the [refmem mqtt_client async_publish] callback will never be invoked,
and you will not "catch" the error or detect the root cause of the issue.
The possible alternative approach, where the __Client__ would return something like "unrecoverable error"
when you try to publish a message to a misconfigured Broker, would have a terrible consequence if the Broker itself is misconfigured.
For example, suppose someone forgets to renew the TLS certificate on the Broker.
The connection will be broken in that case, and the __Client__ would report an "unrecoverable error" through the [refmem mqtt_client async_publish] method.
Now, the expired TLS certificate on the Broker is most probably a temporary issue,
so it is natural that the __Client__ would try to reconnect until the certificate gets renewed.
But, if the __Client__ stops retrying when it detects such an "unrecoverable error," then the decision of when to reconnect would be left to the user.
By design, one of the main functional requirements of the __Client__ was to handle reconnection steps automatically and correctly.
If the decision for reconnection were left to the user, then the user would need to handle all those error states manually,
which would dramatically increase the complexity of the user's code, not to mention how difficult it would be to cover all possible error states.
The proposed approach for detecting configuration errors in the __Client__ is to use some simple logging facility during development.
Log lines should be injected directly into the __Client__ code (typically in the connect_op.hpp file), and logs would uncover misconfigurations (if any).
[heading Increased resource consumption]
The __Client__ is designed to automatically buffer requests that are initiated while it is offline.
During extended downtime or when a high volume of requests accumulates, this can lead to an increase in memory usage.
This aspect is significant for devices with limited resources, as the growing memory consumption can impact their performance and functionality.
[/ TODO: link to the debugging the client chapter ]
[endsect] [/cons]
[endsect] [/auto_reconnect]