AWS Step Functions Error Handling
- Error can happen in variety of ways
- State machine definition error (example: no matching name for a state)
- Task failures (example: exception in lambda)
- Transient issues (example: network partition event)
- Use Retry to retry failed state and Catch to transition the state machine to failure path
- Note: try not to handle the error in the Application layer because it increases the complexity of our application
- Some of the predefined error codes
States.ALL
: matches any error nameStates.Timeout
: task ran longer than TimeoutSeconds eor no heartbeat receviedStates.TaskFailed
: execution failureStates.Permissions
: insufficient privileges to execute code
- The state may report is own errors and you can catch them in step functions
Retry
(for Task State or Parallel State)
ErrorEquals
: specify the error typeIntervalSeconds
: how long should we delay after each retryBackoffRate
: multiple with the delay after each retry for Exponential Backoff (any AWS service)MaxAttempts
: default to 3. Set to 0 to never retry- When max attempts are reached. The Catch block kicks in
Catch
(for Task State or Parallel State)
ErrorEquals
: match a specific kind of errorNext
: state to send toResultPath
: A path that determines what input is sent to the state specified in Next field- the
$.error
puts the error inside the output. For example
- the