opentelemetry
Loading

Troubleshooting EDOT Cloud Forwarder for AWS

This page helps you diagnose and resolve issues with EDOT Cloud Forwarder for AWS when logs are not being forwarded to Elasticsearch as expected.

Use CloudWatch Metrics Explorer to monitor your EDOT Cloud Forwarder Lambda function:

Metric Expected behavior
Duration Increases with file size.
ConcurrentExecutions Should not consistently hit the configured limit.
Errors Should be 0.
Throttles Should be 0.

The LambdaLogGroup resource created by the CloudFormation stack stores all Lambda execution logs. Check these logs for processing errors, configuration issues, or data export failures.

Lambda execution times out before completing. Check file sizes and execution duration in CloudWatch metrics.

The default 15-minute timeout handles all typical scenarios. For large files (multiple gigabytes), increase memory to allocate more CPU for faster processing.

  • Ingestion lag despite fast individual executions.
  • Throttles metric showing non-zero values in CloudWatch.
  • ConcurrentExecutions metric consistently at the configured limit.

Increase the EdotCloudForwarderConcurrentExecutions parameter in your CloudFormation stack.

You might experience one or more of the following symptoms:

  • Logs are not appearing in Elasticsearch or Kibana dashboards.
  • The S3 failure bucket contains unprocessed event files.
  • CloudWatch logs for the Lambda function show errors.
  • Lambda function metrics show increased error rates or throttling.
  • StatusCode errors when invoking the Lambda function.
  1. Check CloudWatch logs for errors

    Open the AWS CloudWatch console and navigate to the LambdaLogGroup created by the CloudFormation stack. Look for error messages that indicate:

    • Network errors when connecting to the OTLP endpoint.
    • Authentication failures due to invalid or expired API key.
    • Log type mismatches between the S3 bucket content and the EdotCloudForwarderS3LogsType setting.
  2. Verify your configuration

    Confirm that your CloudFormation stack parameters are correct:

    • OTLPEndpoint points to a valid Managed OTLP endpoint.
    • ElasticApiKey is valid and not expired.
    • EdotCloudForwarderS3LogsType matches the log format in your S3 bucket (vpcflow, elbaccess, or cloudtrail).
    • The deployment region matches your S3 bucket region.
  3. Check Lambda metrics

    In CloudWatch Metrics Explorer, review the Lambda function metrics:

    Metric What to look for
    Errors Increased error count indicates processing failures.
    Throttles High throttle count suggests you need to increase EdotCloudForwarderConcurrentExecutions.
    Duration Long durations approaching the timeout may cause incomplete processing.
    ConcurrentExecutions Compare against your reserved concurrency limit.
  4. Replay failed events

    If events failed to process, they are stored in the S3 bucket specified by S3FailureBucketARN. Replay them by invoking the Lambda function with the replayFailedEvents trigger:

    aws lambda invoke \
      --function-name <LAMBDA_NAME> \
      --payload '{ "replayFailedEvents": {"replayFailedEvents":{"dryrun":false,"removeOnSuccess":true}}}' \
      --cli-binary-format raw-in-base64-out /dev/null
    		

    Replace <LAMBDA_NAME> with the name of your Lambda function from the deployment.

    The following options are available:

    Option Description Default
    dryrun Run without processing events. Useful for understanding what would be replayed. false
    removeOnSuccess Remove the error event from the S3 failure bucket after successful processing. true
    Duration Long durations approaching the timeout may cause incomplete processing. Consider increasing EdotCloudForwarderMemorySize for faster processing.
    Tip

    Use --timeout with the AWS CLI to increase the Lambda timeout for custom invocations. If a timeout occurs, run the command multiple times to process all error events.

  5. Adjust sizing if needed

    If you're experiencing throttling or timeouts, consider adjusting the Lambda configuration. Refer to Sizing and performance tuning for recommendations based on your log volume.

  • Monitor CloudWatch metrics regularly to catch issues early.
  • Set up CloudWatch alarms for Lambda errors and throttles.
  • Keep your API key up to date and rotate it before expiration.
  • Start with default sizing and increase concurrency or memory only when metrics indicate a need.
  • Ensure each log type uses a dedicated S3 bucket and CloudFormation stack.