How resilient should a bot be?

cris-dsc · November 11, 2023, 9:48pm

I usually make my bots either restart execution or stop it altogether when they throw an error. The first option makes the bot resilient, but it repeats tasks from the start which may be unecessary and time-consuming. The second option makes the bot the opposite of resilient, but it’s easier to identify and understand the problem, which then helps me solve it more quickly.

So now I’m trying to find a middle ground. I want to know how resilient my bots must be with less unecessary repetition of tasks and less interruptions. How do I determine when the bot should restart and when it should retry from a certain point in the process? What would be the criteria?

How do you deal with bot resiliency? Can you share your thoughts with me? Thank you!

Alex · January 12, 2024, 10:02am

Hi, when building bots, resilience is the top priority from my point of view. Performance comes second.
So when we design and build bots, our solution design follows a pattern where we always ensure that the bot will go all the way to the end of the code, successful or not, but we obviously need to capture if was successful or not.

So what we do is at the start of the run, we cleanup the estate, then check if we have new inputs to process, and if we do we log into the target systems.
Then we look through each transaction and at the end, we close all target systems and clean up the estate for the next run.

If something goes wrong during a transaction, we mark it as failed and we loop back to the top to cleanup the estate and re-start all target systems and round we go again.

This may seem that it’s not the most performing way to do it, but its very resilient and resilience means more up-time so in the long run, it’s actually pretty good on performance as well.

Hope this helps