Error Handling
One obvious use for loops is to retry some logic if an error occurs, but there's no need to design looping flow because Direktiv has configurable error catching & retrying available on every action-based state. This will be discussed in a later article.
Handling errors can be an important part of a flow.
Demo
direktiv_api: workflow/v1
functions:
- id: http-request
image: gcr.io/direktiv/functions/http-request:1.0
type: knative-workflow
states:
- id: do-request
type: action
action:
function: http-request
input:
url: http://doesnotexist.xy
retries:
max_attempts: 2
delay: PT5S
multiplier: 2.0
codes: [".*"]
In this example a request is being made to an URL. This URL does not exist to simulate the retry mechanism. It uses the multiplier to try within 5 seconds the first time and 10 seconds the second time.
Catchable Errors
Errors that occur during instance execution usually are considered "catchable". Any flow state may optionally define error catchers, and if a catchable error is raised Direktiv will check to see if any catchers can handle it.
Errors have a "code", which is a string formatted in a style similar to a domain name. Error catchers can explicitly catch a single error code or they can use *
wildcards in their error codes to catch ranges of errors. Setting the error catcher to just "*
" means it will handle any error, so long as no catcher defined higher up in the list has already caught it.
If no catcher is able to handle an error, the flow will fail immediately.
direktiv_api: workflow/v1
functions:
- id: http-request
image: gcr.io/direktiv/functions/http-request:1.0
type: knative-workflow
states:
- id: do-request
type: action
action:
function: http-request
input:
url: http://doesnotexist.xy
retries:
max_attempts: 2
delay: PT5S
multiplier: 2.0
codes: [".*"]
catch:
- error: "direktiv.retries.exceeded"
transition: handle-error
- id: handle-error
type: noop
log: this did not work
In this case the flow catches the failed retries and transitions to handle-error
and the flow finished successful. Every other error will mark the flow execution as failed.
Uncatchable Errors
Rarely, some errors are considered "uncatchable", but generally an uncatchable error becomes catchable if escalated to a calling flow. One example of this is the error triggered by Direktiv if a flow fails to complete within its maximum timeout.
If a flow fails to complete within its maximum timeout it will not be given an opportunity to catch the error and continue running. But if that flow is running as a subflow its parentflow will be able to detect and handle that error.
Retries
Action definitions may optionally define a retry strategy. If a retry strategy is defined the catcher's transition won't be used and no error will be escalated for retryable errors until all retries have failed. A retry strategy might look like the following:
retry:
max_attempts: 3
delay: PT30S
multiplier: 2.0
codes: [".*"]
In this example you can see that a maximum number of attempts is defined, alongside an initial delay between attempts and a multiplication factor to apply to the delay between subsequent attempts.
Recovery
Flows sometimes perform actions which may need to be reverted or undone if the flow as a whole cannot complete successfully. Solving these problems requires careful use of error catchers and transitions.
Cause Errors
Sometimes it is important to fail the flow with a custom error. This is possible with the error
state. This can used e.g. in switch states.
direktiv_api: workflow/v1
states:
- id: a
type: switch
defaultTransition: fail
conditions:
- condition: 'jq(.y == true)'
- id: fail
type: error
error: badinput
message: 'value y not set'
In this example if the payload does not contain y: true
the flow fails. The error throwns badinput
is thrown and the flow failed. The error badinput
could be caught by a parent flow.