job-retry: terraform apply always fails creating retry SQS event source mappings

## Summary

Enabling `job_retry` causes `terraform apply` to fail when creating the SQS event source mapping for the retry Lambda.

The failure is:

```text
InvalidParameterValueException: The function execution role does not have permissions to call ReceiveMessage on SQS
```

## What went wrong

The `job-retry` submodule creates these resources in the same apply:

- the retry SQS queue
- the retry Lambda
- the retry Lambda IAM role
- the inline IAM policy that grants the retry Lambda access to the retry queue
- the Lambda event source mapping from the retry queue to the retry Lambda

The retry policy is defined correctly and already includes the required permissions:

- `sqs:ReceiveMessage`
- `sqs:GetQueueAttributes`
- `sqs:DeleteMessage`

However, the event source mapping does not explicitly depend on that IAM policy resource.

Because of that, Terraform can create the event source mapping before the retry Lambda role has the queue permissions attached. AWS validates the execution role during `CreateEventSourceMapping`, does not see `ReceiveMessage` yet, and rejects the mapping.

## Observed behavior

In my case this was not intermittent. With `job_retry` enabled, apply failed consistently. It did not work even once before the dependency fix.

After adding an explicit dependency from the event source mapping to the retry IAM policy, the same apply succeeded cleanly.

## Reproduction

1. Use the `multi-runner` module
2. Enable `job_retry` on one or more runner configs
3. Run `terraform apply`

Expected failure during creation of one or more `*-job-retry` event source mappings.

## Expected behavior

The retry Lambda IAM policy should be attached before the SQS event source mapping is created.

## Proposed fix

Add an explicit dependency in `modules/runners/job-retry/main.tf`:

```hcl
resource "aws_lambda_event_source_mapping" "job_retry" {
  event_source_arn                   = aws_sqs_queue.job_retry_check_queue.arn
  function_name                      = module.job_retry.lambda.function.arn
  batch_size                         = var.config.lambda_event_source_mapping_batch_size
  maximum_batching_window_in_seconds = var.config.lambda_event_source_mapping_maximum_batching_window_in_seconds

  depends_on = [aws_iam_role_policy.job_retry]
}
```

## Notes

This looks like a deterministic apply-ordering problem, not a missing-permission definition. The retry IAM policy itself already grants the correct SQS actions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

job-retry: terraform apply always fails creating retry SQS event source mappings #5097

Summary

What went wrong

Observed behavior

Reproduction

Expected behavior

Proposed fix

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

job-retry: terraform apply always fails creating retry SQS event source mappings #5097

Description

Summary

What went wrong

Observed behavior

Reproduction

Expected behavior

Proposed fix

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions