Troubleshooting EDOT Cloud Forwarder for AWS
This page helps you diagnose and resolve issues with EDOT Cloud Forwarder for AWS when logs are not being forwarded to Elasticsearch as expected.
Use CloudWatch Metrics Explorer to monitor your EDOT Cloud Forwarder Lambda function:
| Metric | Expected behavior |
|---|---|
| Duration | Increases with file size. |
| ConcurrentExecutions | Should not consistently hit the configured limit. |
| Errors | Should be 0. |
| Throttles | Should be 0. |
The LambdaLogGroup resource created by the CloudFormation stack stores all Lambda execution logs. Check these logs for processing errors, configuration issues, or data export failures.
Lambda execution times out before completing. Check file sizes and execution duration in CloudWatch metrics.
The default 15-minute timeout handles all typical scenarios. For large files (multiple gigabytes), increase memory to allocate more CPU for faster processing.
- Ingestion lag despite fast individual executions.
Throttlesmetric showing non-zero values in CloudWatch.ConcurrentExecutionsmetric consistently at the configured limit.
Increase the EdotCloudForwarderConcurrentExecutions parameter in your CloudFormation stack.
You might experience one or more of the following symptoms:
- Logs are not appearing in Elasticsearch or Kibana dashboards.
- The S3 failure bucket contains unprocessed event files.
- CloudWatch logs for the Lambda function show errors.
- Lambda function metrics show increased error rates or throttling.
StatusCodeerrors when invoking the Lambda function.
-
Check CloudWatch logs for errors
Open the AWS CloudWatch console and navigate to the
LambdaLogGroupcreated by the CloudFormation stack. Look for error messages that indicate:- Network errors when connecting to the OTLP endpoint.
- Authentication failures due to invalid or expired API key.
- Log type mismatches between the S3 bucket content and the
EdotCloudForwarderS3LogsTypesetting.
-
Verify your configuration
Confirm that your CloudFormation stack parameters are correct:
OTLPEndpointpoints to a valid Managed OTLP endpoint.ElasticApiKeyis valid and not expired.EdotCloudForwarderS3LogsTypematches the log format in your S3 bucket (vpcflow,elbaccess, orcloudtrail).- The deployment region matches your S3 bucket region.
-
Check Lambda metrics
In CloudWatch Metrics Explorer, review the Lambda function metrics:
Metric What to look for ErrorsIncreased error count indicates processing failures. ThrottlesHigh throttle count suggests you need to increase EdotCloudForwarderConcurrentExecutions.DurationLong durations approaching the timeout may cause incomplete processing. ConcurrentExecutionsCompare against your reserved concurrency limit. -
Replay failed events
If events failed to process, they are stored in the S3 bucket specified by
S3FailureBucketARN. Replay them by invoking the Lambda function with thereplayFailedEventstrigger:aws lambda invoke \ --function-name <LAMBDA_NAME> \ --payload '{ "replayFailedEvents": {"replayFailedEvents":{"dryrun":false,"removeOnSuccess":true}}}' \ --cli-binary-format raw-in-base64-out /dev/nullReplace
<LAMBDA_NAME>with the name of your Lambda function from the deployment.The following options are available:
Option Description Default dryrunRun without processing events. Useful for understanding what would be replayed. falseremoveOnSuccessRemove the error event from the S3 failure bucket after successful processing. trueDurationLong durations approaching the timeout may cause incomplete processing. Consider increasing EdotCloudForwarderMemorySizefor faster processing.TipUse
--timeoutwith the AWS CLI to increase the Lambda timeout for custom invocations. If a timeout occurs, run the command multiple times to process all error events. -
Adjust sizing if needed
If you're experiencing throttling or timeouts, consider adjusting the Lambda configuration. Refer to Sizing and performance tuning for recommendations based on your log volume.
- Monitor CloudWatch metrics regularly to catch issues early.
- Set up CloudWatch alarms for Lambda errors and throttles.
- Keep your API key up to date and rotate it before expiration.
- Start with default sizing and increase concurrency or memory only when metrics indicate a need.
- Ensure each log type uses a dedicated S3 bucket and CloudFormation stack.