Implementing exponential backoff
This page explains how to use truncated exponential backoff to ensure your devices do not generate excessive load.
When devices retry calls without waiting, they can produce a heavy load on the ClearBlade IoT Core servers. ClearBlade IoT Core automatically limits projects that generate excessive load. Even a small fraction of overactive devices can trigger limits that affect all devices in the same Google Cloud project.
You are strongly encouraged to implement truncated exponential backoff with introduced jitter to avoid triggering these limits. If you have questions or would like to discuss your algorithm’s specifics, email iotcore@clearblade.com with this information:
IoT Core registry name
Number of devices connected
Industry
Truncated exponential backoff is a standard error-handling strategy for network applications. Clients will periodically retry a failed request with increasing delays between requests. Clients should use truncated exponential backoff for all requests to ClearBlade IoT Core that return HTTP 5xx
and 429
response codes and disconnections from the MQTT server.
Example algorithm
An exponential backoff algorithm retries requests exponentially, increasing the waiting time between retries up to a maximum backoff time. For example:
Make a request to ClearBlade IoT Core.
If the request fails, wait 1 +
random_number_milliseconds
seconds and retry the request.If the request fails, wait 2 +
random_number_milliseconds
seconds and retry the request.If the request fails, wait 4 +
random_number_milliseconds
seconds and retry the request.And so on, up to a
maximum_backoff
time.Continue waiting and retrying up to some maximum number of retries, but do not increase the waiting period between retries.
where:
The wait time is
min(((2^n)+random_number_milliseconds), maximum_backoff)
, withn
incremented by 1 for each iteration (request).random_number_milliseconds
is a random number of milliseconds less than or equal to 1000. This helps to avoid cases in which some situation synchronizes many clients, and all retry at once, sending requests in synchronized waves. Therandom_number_milliseconds
value is recalculated after each retry request.maximum_backoff
is typically 32 or 64 seconds. The appropriate value depends on the use case.
The client can continue retrying after it has reached the maximum_backoff
time. Retries after this point do not need to continue increasing backoff time. For example, suppose a client uses a maximum_backoff
time of 64 seconds. After reaching this value, the client can retry every 64 seconds. At some point, clients should be prevented from retrying indefinitely.
The wait time between retries and the number of retries depends on your use case and network conditions.