What is the behavior of the eventing service when, for some reason, the curl endpoint is unavailable?
What I would expect is that it reads the response code and if it is an unsuccessful code, the function will be retried later for that particular document. Is this assumption correct? If yes, what is the retry procedure implemented by the eventing service?
Or is there some sort of acknowledgement machanism that should be implemented manually (something like flowController.ack(event) of the DCP client)
The documentation does not mention anything regarding error handling.
As of now, we do not do automatic retries when a curl call fails. The behaviour is similar to any code that you write in any systems, wherein you may enclose the curl calls with a while loop and keep retrying till it succeeds or till a predefined threshold is reached.
while(counter < 5)
{
try
{
var result = curl(...);
break;
}
catch(e)
{
counter ++;
log("Exception:", e);
}
}
We plan to include a counter for max-retries when this feature goes GA.
We did not document the native curl support as it is still in Developer Preview, and as you would have noticed in Blog Post as well, this is not recommended for Production Environments. We are still fine-tuning the design and the API signature may change in the future to accommodate for retries and others.
Thanks again and feel free to provide any other feedback you may have with the Eventing Service.
@venkat Thank you for your answer. I understand how it works now. So, if I cannot lose any mutation, would it be reasonable to put the function in an infinite loop with a sleep() of 10s in the catch block? After the curl() endpoint is up again, will the eventing service resume the work properly? I need to make sure that every document will eventually reach the endpoint.
I would not recommend an infinite loop. Think of the logic as how you would do it if Eventing wasn’t there - for eg - from the middleware. Infinite loops eat up system resources and also are not efficient compute patterns.
The OnUpdate() is going to be called for every document mutation; if you are going to sleep and retry for every mutation indefinitely, you will nuke the cluster
The best recommendation is : retry it a few times(threshold), and if it still fails, then put a log() statement. Do remember that you also have application specific log files that gives you isolation. This way, you will not lose track of the doc that was not processed due to a run time error. You can have a separate job that looks at the log file - but this purely for reconciliation than anything else.
@venkat Thank you for the answer. I think that it would be really useful to have some sort of ack mechanism that enables the control of what was delivered and some sort of re-delivery mechanism for unacked messages. Otherwise the eventing functionality will only be useful for non-critical purposes. Can you think of any way I could implement a fault-tolerant function that communicates with external systems?