Add a chapter on auto-reconnect and retry mechanism

Summary: related to T12804 Reviewers: ivica Reviewed By: ivica Subscribers: miljen, iljazovic Differential Revision: https://repo.mireo.local/D28198
2024-02-29 13:03:11 +01:00
parent a1249b433d
commit f34705d74e
4 changed files with 101 additions and 2 deletions
--- a/doc/qbk/00_main.qbk
+++ b/doc/qbk/00_main.qbk
@@ -111,7 +111,8 @@

 [include 01_intro.qbk]
 [include 02_configuring_the_client.qbk]
-[include 03_examples.qbk]
+[include 03_auto_reconnect.qbk]
+[include 04_examples.qbk]

 [include examples/Examples.qbk]

--- a/doc/qbk/01_intro.qbk
+++ b/doc/qbk/01_intro.qbk
@@ -44,7 +44,8 @@ In the event of a connection failure with one Broker, the Client switches to the
 * [*Offline buffering]: While offline, it automatically buffers all the packets to send when the connection is re-established.

 [heading Example]
-The following example illustrates a simple scenario of configuring the Client and publishing an Application Message.
+The following example illustrates a simple scenario of configuring a Client and publishing a
+"Hello World!" Application Message with `QoS` 0. 

 [!c++]
 	#include <iostream>
--- a/doc/qbk/03_auto_reconnect.qbk
+++ b/doc/qbk/03_auto_reconnect.qbk
@@ -0,0 +1,97 @@
+[section:auto_reconnect Built-in auto-reconnect and retry mechanism]
+[nochunk]
+
+The auto-reconnect and retry mechanism is a key feature of the __Self__ library.
+It is designed to internally manage the complexities of disconnects, backoffs, reconnections, and message retransmissions. 
+These tasks, if handled manually, could lead to extended development times, difficult testing, and compromised reliability. 
+By automating these processes, the __Self__ library enables users of the __Client__ to focus primarily
+on the application's functionality without worrying about the repercussions of lost network connectivity.
+
+You can call any asynchronous function within the __Client__ regardless of its current connection status.
+If the __Client__ is offline, it will queue all outgoing requests and transmit them as soon as the connection is restored.
+In situations where the connection is unexpectedly lost mid-protocol flow,
+the __Client__ complies with the MQTT protocol's specified message delivery retry mechanisms.
+
+The following example will showcase how the __Client__ internally manages various scenarios, including successful transmission, offline queuing,
+and connection loss retransmissions, when executing a request to publish a message with QoS 1.
+Note that the same principle applies to all other asynchronous functions within the __Client__ 
+(see /Completion condition/ under [refmem mqtt_client async_publish], [refmem mqtt_client async_subscribe], [refmem mqtt_client async_unsubscribe],
+and [refmem mqtt_client async_disconnect]).
+
+	// Publishing with QoS 1 involves a two-step process: sending a PUBLISH message to the Broker and awaiting a PUBACK (acknowledgement) response.
+	// The scenarios that might unfold include:
+	// a) The Client sends the PUBLISH message immediately.
+	// b) If the Client is offline when attempting to publish, it queues the PUBLISH message and sends it 
+	//    as soon as the connection is re-established.
+	// c) Should the Client lose connection after sending the PUBLISH message but before receiving a PUBACK,
+	//    it will automatically retransmit the PUBLISH message once connectivity is restored.
+
+	client.async_publish<async_mqtt5::qos_e::at_least_once>(
+		"my-topic", "Hello world!",
+		async_mqtt5::retain_e::no, async_mqtt5::publish_props {},
+		[](async_mqtt5::error_code ec, async_mqtt5::reason_code rc, async_mqtt5::puback_props props) {
+			// This callback is invoked under any of the following circumstances:
+			//  a) The Client successfully sends the PUBLISH packet and receives a PUBACK from the Broker.
+			//  b) The Client encounters a non-recoverable error, such as a cancellation or providing invalid parameters
+			//     to async_publish, which prevents the message from being sent.
+		}
+	);
+
+[section:sentry ]
+
+
+[endsect] [/sentry]
+
+[section:cons Considerations and limitations]
+
+The integrated auto-reconnect and retry mechanism greatly improves the user experience
+by simplifying complex processes and ensuring continuous connections.
+However, it is important to be mindful of certain limitations and considerations associated with this feature.
+
+[heading Delayed handler invocation]
+
+During extended periods of __Client__ downtime, the completion handlers for asynchronous functions,
+such as those used in [refmem mqtt_client async_publish], may face considerable delays before invocation.
+This can result in users being left in the dark regarding the status of their requests due to the absence of prompt feedback on the initiated actions.
+
+[heading Concealing configuration-related issues]
+
+The __Client__ will always try to reconnect to the Broker(s) regardless of the reason why the connection was previously closed.
+This is desirable behaviour when the connection gets dropped due to underlying stream transport issues,
+such as when a device connected to the network loses its GSM connectivity.
+
+However, the connection may be closed (or never established) for other reasons,
+which are typically related to misconfiguration of broker IPs, ports, expired or incorrect TLS, or MQTT-related errors,
+such as trying to communicate with a Broker that does not support MQTT 5.
+In these cases, the __Client__ will still endlessly try to connect to Broker(s), but the connection will never succeed.
+
+The most challenging problem here is that users of the __Client__ do not get informed in any way that the connection cannot be established.
+So, if you make a typo in the Broker's IP, run the __Client__, and publish some message, the [refmem mqtt_client async_publish] callback will never be invoked,
+and you will not "catch" the error or detect the root cause of the issue.
+
+The possible alternative approach, where the __Client__ would return something like "unrecoverable error" 
+when you try to publish a message to a misconfigured Broker, would have a terrible consequence if the Broker itself is misconfigured.
+For example, suppose someone forgets to renew the TLS certificate on the Broker.
+The connection will be broken in that case, and the __Client__ would report an "unrecoverable error" through the [refmem mqtt_client async_publish] method.
+Now, the expired TLS certificate on the Broker is most probably a temporary issue,
+so it is natural that the __Client__ would try to reconnect until the certificate gets renewed.
+But, if the __Client__ stops retrying when it detects such an "unrecoverable error," then the decision of when to reconnect would be left to the user.
+
+By design, one of the main functional requirements of the __Client__ was to handle reconnection steps automatically and correctly.
+If the decision for reconnection were left to the user, then the user would need to handle all those error states manually,
+which would dramatically increase the complexity of the user's code, not to mention how difficult it would be to cover all possible error states.
+
+The proposed approach for detecting configuration errors in the __Client__ is to use some simple logging facility during development.
+Log lines should be injected directly into the __Client__ code (typically in the connect_op.hpp file), and logs would uncover misconfigurations (if any).
+
+[heading Increased resource consumption]
+
+The __Client__ is designed to automatically buffer requests that are initiated while it is offline. 
+During extended downtime or when a high volume of requests accumulates, this can lead to an increase in memory usage.
+This aspect is significant for devices with limited resources, as the growing memory consumption can impact their performance and functionality.
+
+[/ TODO: link to the debugging the client chapter ]
+
+[endsect] [/cons]
+
+[endsect] [/auto_reconnect]
--- a/doc/qbk/04_examples.qbk
+++ b/doc/qbk/04_examples.qbk