Troubleshooting EDOT Cloud Forwarder for AWS

This page helps you diagnose and resolve issues with EDOT Cloud Forwarder for AWS when logs are not being forwarded to Elasticsearch as expected.

Key metrics to monitor

Use CloudWatch Metrics Explorer to monitor your EDOT Cloud Forwarder Lambda function:

Metric	Expected behavior
Duration	Increases with file size.
ConcurrentExecutions	Should not consistently hit the configured limit.
Errors	Should be 0.
Throttles	Should be 0.

The LambdaLogGroup resource created by the CloudFormation stack stores all Lambda execution logs. Check these logs for processing errors, configuration issues, or data export failures.

Lambda timeouts

Symptoms

Lambda execution times out before completing. Check file sizes and execution duration in CloudWatch metrics.

Resolution

The default 15-minute timeout handles all typical scenarios. For large files (multiple gigabytes), increase memory to allocate more CPU for faster processing.

Concurrency throttling

Symptoms

Ingestion lag despite fast individual executions.
Throttles metric showing non-zero values in CloudWatch.
ConcurrentExecutions metric consistently at the configured limit.

Resolution

Increase the EdotCloudForwarderConcurrentExecutions parameter in your CloudFormation stack.

Failed log forwarding

Symptoms

You might experience one or more of the following symptoms:

Logs are not appearing in Elasticsearch or Kibana dashboards.
The S3 failure bucket contains unprocessed event files.
CloudWatch logs for the Lambda function show errors.
Lambda function metrics show increased error rates or throttling.
StatusCode errors when invoking the Lambda function.

Resolution

Check CloudWatch logs for errors

Open the AWS CloudWatch console and navigate to the LambdaLogGroup created by the CloudFormation stack. Look for error messages that indicate:
- Network errors when connecting to the OTLP endpoint.
- Authentication failures due to invalid or expired API key.
- Log type mismatches between the S3 bucket content and the EdotCloudForwarderS3LogsType setting.
Verify your configuration

Confirm that your CloudFormation stack parameters are correct:
- OTLPEndpoint points to a valid Managed OTLP endpoint.
- ElasticApiKey is valid and not expired.
- EdotCloudForwarderS3LogsType matches the log format in your S3 bucket (vpcflow, elbaccess, or cloudtrail).
- The deployment region matches your S3 bucket region.

Check Lambda metrics

In CloudWatch Metrics Explorer, review the Lambda function metrics:

Metric	What to look for
`Errors`	Increased error count indicates processing failures.
`Throttles`	High throttle count suggests you need to increase `EdotCloudForwarderConcurrentExecutions`.
`Duration`	Long durations approaching the timeout may cause incomplete processing.
`ConcurrentExecutions`	Compare against your reserved concurrency limit.

Replay failed events

If events failed to process, they are stored in the S3 bucket specified by S3FailureBucketARN. Replay them by invoking the Lambda function with the replayFailedEvents trigger:

		aws lambda invoke \
  --function-name <LAMBDA_NAME> \
  --payload '{ "replayFailedEvents": {"replayFailedEvents":{"dryrun":false,"removeOnSuccess":true}}}' \
  --cli-binary-format raw-in-base64-out /dev/null
		
	

Replace <LAMBDA_NAME> with the name of your Lambda function from the deployment.

The following options are available:

Option	Description	Default
`dryrun`	Run without processing events. Useful for understanding what would be replayed.	`false`
`removeOnSuccess`	Remove the error event from the S3 failure bucket after successful processing.	`true`
`Duration`	Long durations approaching the timeout may cause incomplete processing. Consider increasing `EdotCloudForwarderMemorySize` for faster processing.

Tip

Use --timeout with the AWS CLI to increase the Lambda timeout for custom invocations. If a timeout occurs, run the command multiple times to process all error events.

Adjust sizing if needed

If you're experiencing throttling or timeouts, consider adjusting the Lambda configuration. Refer to Sizing and performance tuning for recommendations based on your log volume.

Best practices

Monitor CloudWatch metrics regularly to catch issues early.
Set up CloudWatch alarms for Lambda errors and throttles.
Keep your API key up to date and rotate it before expiration.
Start with default sizing and increase concurrency or memory only when metrics indicate a need.
Ensure each log type uses a dedicated S3 bucket and CloudFormation stack.