﻿---
title: Common problems with Fleet and Elastic Agent
description: We have collected the most common known problems and listed them here. If your problem is not described here, escalate as needed. Elastic Agent enrollment,...
url: https://www.elastic.co/elastic/docs-builder/docs/3028/troubleshoot/ingest/fleet/common-problems
products:
  - Elastic Agent
  - Fleet
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Common problems with Fleet and Elastic Agent
We have collected the most common known problems and listed them here. If your problem is not described here, [escalate as needed](/elastic/docs-builder/docs/3028/troubleshoot/ingest/fleet/fleet-elastic-agent#troubleshooting-intro-escalate).

## Troubleshooting contents

**Elastic Agent enrollment, upgrade, and unenrollment**
- [Elastic Agent enrollment fails on the host with `x509: certificate signed by unknown authority` message](#agent-enrollment-certs)
- [Elastic Agent enrollment fails on the host with `x509: cannot validate certificate for x.x.x.x because it doesn't contain any IP SANs` message](#es-enrollment-certs)
- [Elastic Agent enrollment fails on the host with `Client.Timeout exceeded` message](#agent-enrollment-timeout)
- [Elastic Agent enrollment fails on the host with `Error while dialing: open \\\\.\\pipe\\elastic-agent-system: The system cannot find the file specified` message](#agent-enrollment-dialing)
- [Elastic Agent fails to enroll with Fleet Server running on localhost](#mac-file-sharing)
- [Elastic Agent hangs while unenrolling](#agent-hangs-while-unenrolling)
- [Elastic Agent is automatically unenrolled after failed check-ins with 401 errors](#agent-auto-unenroll-401) (Deprecated 9.1)
- [Elastic Agent upgrade fails on Windows with exit status `0xc0000142`](#agent-upgrade-fail-windows)
- [Elastic Agent unenroll fails](#deleted-policy-unenroll)
- [Uninstalling Elastic Endpoint fails](#endpoint-not-uninstalled-with-agent)

---

**Elastic Agent status**
- [Retrieve the Elastic Agent version](#trb-retrieve-agent-version)
- [Check the Elastic Agent status](#trb-check-agent-status)
- [Capture Elastic Agent diagnostics](https://www.elastic.co/elastic/docs-builder/docs/3028/troubleshoot/ingest/fleet/diagnostics)
- [Some problems occur so early that insufficient logging is available](#not-installing-no-logs-in-terminal)
- [Elastic Agent is cited as `Healthy` but still has set up problems sending data to Elasticsearch](#agent-healthy-but-no-data-in-es)
- [Elastic Agent is stuck in status `Updating`](#fleet-agent-stuck-on-updating)

---

**Authentication and access**
- [Elasticsearch authentication service fails with `Authentication using apikey failed` message](#es-apikey-failed)
- [Elastic Agent fails with `Agent process is not root/admin or validation failed` message](#process-not-root)
- [API key is unauthorized to send telemetry to `.logs-endpoint.diagnostic.collection-*` indices](#endpoint-unauthorized)
- [Error when running Elastic Agent commands with `sudo`](#agent-sudo-error)

---

**Fleet Server and Elastic Agent**
- [On Fleet Server startup, ERROR seen with `State changed to CRASHED: exited with code: 1`](#ca-cert-testing)
- [Fleet Server is running and healthy with data, but other Agents cannot use it to connect to Elasticsearch](#secondary-agent-not-connecting)

---

**Elastic Agent and integrations**
- [Integration policy upgrade has too many conflicts](#upgrading-integration-too-many-conflicts)
- [Elastic Agents are unable to connect after removing the Fleet Server integration](#fleet-server-integration-removed)
- [illegal_argument_exception when TSDB is enabled](#tsdb-illegal-argument)
- [The `/api/fleet/setup` endpoint can’t reach the package registry to install Integrations](#fleet-setup-fails)

---

**Elastic Cloud and Kibana issues**
- [Fleet in Kibana crashes](#fleet-app-crashes)
- [Hosted Elastic Agent is offline](#hosted-agent-offline)
- [Elastic Agents hosted on Elastic Cloud are stuck in `Updating` or `Offline`](#agents-in-cloud-stuck-at-updating)
- [When using Elastic Cloud, Fleet Server is not listed in Kibana](#fleet-server-not-in-kibana-cloud)

---

**Elastic Agent on Kubernetes**
- [Elastic Agent Out of Memory errors on Kubernetes](#agent-oom-k8s)
- [Troubleshoot Elastic Agent installation on Kubernetes, with Kustomize](#agent-kubernetes-kustomize)
- [Troubleshoot Elastic Agent on Kubernetes seeing `invalid api key to authenticate with fleet` in logs](#agent-kubernetes-invalid-api-key)

---

**Air-gapped environments**
- [Kibana cannot connect to Elastic Package Registry in air-gapped environments](#fleet-errors-tls)

---


## Elastic Agent enrollment, upgrade, and unenrollment


### Elastic Agent enrollment fails on the host with `x509: certificate signed by unknown authority` message

To ensure that communication with Fleet Server is encrypted, Fleet Server requires Elastic Agents to present a signed certificate. In a self-managed cluster, if you don’t specify certificates when you set up Fleet Server, self-signed certificates are generated automatically.
If you attempt to enroll an Elastic Agent in a Fleet Server with a self-signed certificate, you will encounter the following error:
```sh
Error: fail to enroll: fail to execute request to fleet-server: x509: certificate signed by unknown authority
Error: enroll command failed with exit code: 1
```

To fix this problem, pass the `--insecure` flag along with the `enroll` or `install` command. For example:
```sh
sudo ./elastic-agent install --url=https://<fleet-server-ip>:8220 --enrollment-token=<token> --insecure
```

Traffic between Elastic Agents and Fleet Server over HTTPS is encrypted.
By adding this flag, you are acknowledging that you understand that the certificate chain cannot be verified.
Allowing Fleet Server to generate self-signed certificates is useful to get things running for development, but not recommended in a production environment.
For more information, refer to [Configure SSL/TLS for self-managed Fleet Servers](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/secure-connections).

### Elastic Agent enrollment fails on the host with `x509: cannot validate certificate for x.x.x.x because it doesn't contain any IP SANs` message

To ensure that communication with Elasticsearch is encrypted, Fleet Server requires Elasticsearch to present a signed certificate.
This error occurs when you use self-signed certificates with Elasticsearch using IP as a Common Name (CN). With IP as a CN, Fleet Server looks into subject alternative names (SANs), which are empty. To work around this situation, use the `--fleet-server-es-insecure` flag to deactivate certificate verification.
You will also need to set `ssl.verification_mode: none` in the Output settings in Fleet and Integrations UI.

### Elastic Agent enrollment fails on the host with `Client.Timeout exceeded` message

To enroll in Fleet, Elastic Agent must connect to the Fleet Server instance. If the agent cannot connect, you get failures similar to these:
```txt
fail to enroll: fail to execute request to Fleet Server:Post http://fleet-server:8220/api/fleet/agents/enroll?: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
```

Here are several steps to help you troubleshoot the problem.
1. Check for networking problems. From the host, run the `ping` command to confirm that it can reach the Fleet Server instance.
2. Additionally, `curl` the `/status` API of Fleet Server:
   ```shell
   curl -f http://<fleet-server-url>:8220/api/status
   ```
3. Verify that you have specified the correct Kibana Fleet settings URL and port for your environment.
   By default, HTTPS protocol and port 8220 is expected by Fleet Server to communicate with Elasticsearch unless you have explicitly set it otherwise.
4. Check that you specified a valid enrollment key during enrollment. To do this:
   1. In Fleet, select **Enrollment tokens**.
2. To view the secret, click the eyeball icon. The secret should match the string that you used to enroll Elastic Agent on your host.
3. If the secret doesn’t match, create a new enrollment token and use this token when you run the `elastic-agent enroll` command.


### Elastic Agent enrollment fails on the host with `Error while dialing: open \\\\.\\pipe\\elastic-agent-system: The system cannot find the file specified` message

Elastic Agent might fail to install in a Windows environment due to port conflicts and file locks, returning this error:
```txt
Restart attempt 2 failed: 'rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: open \\\\.\\pipe\\elastic-agent-system: The system cannot find the file specified.
```

To resolve port conflicts:
1. Check for any processes that are using port 6789 or 6790:
   ```bash
   netstat -ano | findstr :6789
   netstat -ano | findstr :6790
   ```
   This will return the process ID (PID) of the application that's using the specified port. You can then identify the application using its PID:
   ```bash
   tasklist /fi "pid eq <APP-PID>"
   ```
2. In case of a port conflict, update `agent.grpc.port` in the `elastic-agent.yml` file to bind the agent to a different port (for example, 6790).


### Elastic Agent fails to enroll with Fleet Server running on localhost

If you’re testing Fleet Server locally on a macOS system using localhost (`https://127.0.0.1:8220`) as the Host URL, you might encounter this error:
```sh
Error: fail to enroll: fail to execute request to fleet-server:
lookup My-MacBook-Pro.local: no such host
```

This can occur on newer macOS software. To resolve the problem, [ensure that file sharing is enabled](https://support.apple.com/en-ca/guide/mac-help/mh17131/mac) on your local system.

### Elastic Agent hangs while unenrolling

When unenrolling Elastic Agent, Fleet waits for acknowledgment from the agent before it completes the unenroll process. If Fleet doesn’t receive an acknowledgment, the status hangs at `unenrolling.`
You can unenroll an agent to invalidate all API keys related to the agent and change the status to `inactive` so that the agent no longer appears in Fleet.
1. In Fleet, select **Agents**.
2. Under Agents, choose **Unenroll agent** from the **Actions** menu next to the agent you want to unenroll.
3. Click **Force unenroll**.


### Elastic Agent is automatically unenrolled after failed check-ins with 401 errors

<applies-to>
  - Elastic Stack: Deprecated since 9.1
</applies-to>

In Elastic Agent versions prior to 8.19.0 and 9.1.0, if an agent receives a 401 (Unauthorized) error on more than seven consecutive check-ins with Fleet Server, the agent is automatically unenrolled.
To resolve the issue:
<stepper>
  <step title="Re-enroll the agent">
    - If the agent is still installed on the host, re-enroll it in Fleet to keep the agent's existing state, including any previously ingested data:
      1. Open the **Agents** tab, then click **Add agent**.
    2. In the **Add agent** flyout, select the agent policy in which to re-enroll the agent.
    3. In the **Authentication settings** section, select an enrollment token:
       - If one or more active enrollment tokens exist for your agent policy, select one from the dropdown.
    - If no active tokens exist, click **Create enrollment token**. For detailed instructions, refer to [Create enrollment tokens](/elastic/docs-builder/docs/3028/reference/fleet/fleet-enrollment-tokens#create-fleet-enrollment-tokens).
    4. Make sure **Enroll in Fleet** is selected.
    5. Select the appropriate platform, then copy the `elastic-agent install` command from the UI, and replace `install` with `enroll`.
    6. Run the modified command with elevated privileges from the directory where the agent is installed. For example:
       ```bash
       sudo ./elastic-agent enroll --url=<fleet-server-url> --enrollment-token=<token>
       ```
       Refer to the [command reference](/elastic/docs-builder/docs/3028/reference/fleet/agent-command-reference#elastic-agent-enroll-command) for details about the available options.
    - If the agent is no longer installed on the host, reinstall and enroll it in Fleet. Refer to [Install Fleet-managed Elastic Agents](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/install-fleet-managed-elastic-agent) for detailed instructions.
  </step>

  <step title="Resolve the underlying issues">
    Investigate the cause of the 401 errors and resolve the underlying issues to ensure proper agent functionality.401 errors during check-in typically indicate authentication or authorization problems. Common causes include:
    - Expired or revoked API keys
    - Incorrect Fleet Server configuration
    - Issues with Elasticsearch authentication settings
  </step>
</stepper>

<admonition title="Agents are no longer automatically unenrolled" applies-to="Elastic Stack: Generally available since 9.1">
  The automatic unenrollment behavior is removed in Elastic Agent versions 8.19.0 and 9.1.0. Starting with these versions, Elastic Agents are no longer automatically unenrolled due to repeated 401 errors during check-in. When the issue causing the errors is resolved, the agents automatically reconnect to Fleet and resume ingesting data.
</admonition>


### Elastic Agent upgrade fails on Windows with exit status `0xc0000142`

During an Elastic Agent upgrade on Windows, Elastic Agent spawns a "watcher" process that monitors the upgrade process. Windows attempts to create a temporary console for this process. If Windows can't create this console, the watcher process initialization fails with error code `0xc0000142` (`STATUS_DLL_INIT_FAILED`), resulting in an upgrade failure. Elastic Agent logs this error at the `info` level.
The error is caused by Windows [desktop heap exhaustion](https://learn.microsoft.com/en-us/troubleshoot/windows-server/performance/desktop-heap-limitation-out-of-memory). When Elastic Agent runs as a [Windows service application](https://learn.microsoft.com/en-us/dotnet/framework/windows-services/introduction-to-windows-service-applications), it uses the service desktop, and shares the desktop heap with other running services. If a service process is using windowing resources, but is failing to release them, this can exhaust the desktop heap and affect Elastic Agent.
<note>
  Interactively-run instances of `elastic-agent.exe` are not subject to this limitation. Only instances running as a service are potentially affected.
</note>

To resolve the issue, try these tips:
- **Update Elastic Agent immediately after a system reboot**
  A system reboot destroys and recreates the desktop heap, resolving any prior exhaustion.
  Because many memory leaks are gradual, updating Elastic Agent immediately after a system reboot might allow Elastic Agent to upgrade before the memory leaking application exhausts the desktop heap.
  <tip>
  A [cold startup](https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/distinguishing-fast-startup-from-wake-from-hibernation) resets kernel memory, but a fast startup or a wake from hibernation does not.
  A regular reboot (for example, `shutdown /r /t 0`) results in a cold startup, and resets the desktop heap.
  </tip>
- **Update third-party service applications**
  As standard Windows tools such as Task Manager and Process Explorer do not attribute desktop heap usage by application, you have to consider updating all third-party processes that are running as a service. To list these applications, use the following PowerShell command:
  ```powershell
  PS C:\> Get-Process | Where {$_.SI -eq 0} | Where {$_.MainModule.FileVersionInfo.ProductName -and (-not (($_.MainModule.FileVersionInfo.CompanyName -eq "Microsoft Corporation") -and ($_.MainModule.FileVersionInfo.ProductName -like "*Windows*"))) } | ForEach-Object { $_.MainModule.FileVersionInfo.ProductName + ' - ' + $_.Path }
  ```
  You can then install any updates from the listed applications' manufacturers.
- **Stop or uninstall third-party service applications**
  You can try terminating or uninstalling non-critical third-party service applications before updating Elastic Agent.
  Terminating a process releases its desktop heap resources.
  Note that the Elastic Agent update process does not require a significant amount of desktop heap resources, so a successful Elastic Agent update following the termination or uninstallation of a service application does not necessarily mean that the application was exhausting the desktop heap.
- **Resize the desktop heap**
  As a short-term solution, follow the steps described in the [Microsoft guide](https://learn.microsoft.com/en-us/troubleshoot/windows-server/performance/desktop-heap-limitation-out-of-memory) to increase the size of the desktop heap. If a service application is causing a memory leak, increasing the size of the desktop heap might only postpone the desktop heap exhaustion.


### Elastic Agent unenroll fails

In Fleet, if you delete an Elastic Agent policy that is associated with one or more inactive enrolled agents, when the agent returns back to a `Healthy` or `Offline` state, it cannot be unenrolled. Attempting to unenroll the agent results in an `Error unenrolling agent` message, and the unenrollment fails.
To resolve this problem, you can use the [Kibana Fleet APIs](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/fleet-api-docs) to force unenroll the agent.
To uninstall a single Elastic Agent:
```shell
POST kbn:/api/fleet/agents/<agent_id>/unenroll
{
  "force": true,
  "revoke": true
}
```

To bulk uninstall a set of Elastic Agents:
```shell
POST kbn:/api/fleet/agents/bulk_unenroll
{ "agents": ["<agent_id1>", "<agent-id2>"],
  "force": true,
  "revoke": true
}
```

We are also updating the Fleet UI to prevent removal of an Elastic Agent policy that is currently associated with any inactive agents.

### Uninstalling Elastic Endpoint fails

When you uninstall Elastic Agent, all the programs managed by Elastic Agent, such as Elastic Endpoint, are also removed. If uninstalling fails, Elastic Endpoint might remain on your system.
To remove Elastic Endpoint, run the following commands:
<tab-set>
  <tab-item title="macOS">
    ```shell
    cd /tmp
    cp /Library/Elastic/Endpoint/elastic-endpoint elastic-endpoint
    sudo ./elastic-endpoint uninstall
    rm elastic-endpoint
    ```
  </tab-item>

  <tab-item title="Linux">
    ```shell
    cd /tmp
    cp /opt/Elastic/Endpoint/elastic-endpoint elastic-endpoint
    sudo ./elastic-endpoint uninstall
    rm elastic-endpoint
    ```
  </tab-item>

  <tab-item title="Windows">
    ```shell
    cd %TEMP%
    copy "c:\Program Files\Elastic\Endpoint\elastic-endpoint.exe" elastic-endpoint.exe
    .\elastic-endpoint.exe uninstall
    del .\elastic-endpoint.exe
    ```
  </tab-item>
</tab-set>

---


## Elastic Agent status


### Retrieve the Elastic Agent version

1. If you installed the Elastic Agent, run the following command (the example is for POSIX based systems):
   ```shell
   elastic-agent version
   ```
2. If you have not installed the Elastic Agent and you are running it as a temporary process, you can run:
   ```shell
   ./elastic-agent version
   ```
   <note>
   Both of the above commands are accessible via Windows or macOS with their OS-specific slight variation in how you call them. If needed, refer to [*Install Elastic Agents*](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/install-elastic-agents) for examples of how to adjust them.
   </note>


### Check the Elastic Agent status

Run the following command to view the current status of the Elastic Agent.
```shell
elastic-agent status
```

Based on the information returned, you can take further action.
If Elastic Agent is running, but you do not get what you expect, here are some items to review:
1. In Fleet, click **Agents**. Check which policy is associated with the running Elastic Agent. If it is not the policy you expected, you can change it.
2. In Fleet, click **Agents**, and then select the Elastic Agent policy. Check for the integrations that should be included.
   For example, if you want to include system data, make sure the **System** integration is included in the policy.
3. Confirm if the **Collect agent logs** and **Collect agent metrics** options are selected.
   1. In Fleet, click **Agents**, and then select the Elastic Agent policy.
2. Select the **Settings** tab. If you want to collect agent logs or metrics, select these options.
   <important>
   The **Elastic Cloud agent policy** is created only in Elastic Cloud deployments and, by default, does not include the collection of logs of metrics.
   </important>


### Some problems occur so early that insufficient logging is available

If some problems occur early and insufficient logging is available, run the following command:
```shell
./elastic-agent install -f
```

The stand-alone install command installs the Elastic Agent, and all of the service configuration is set up. You can now run the *enrollment* command. For example:
```shell
elastic-agent enroll --fleet-server-es=https://<es-url>:443 --fleet-server-service-token=<token> --fleet-server-policy=<policy-id>
```

Note: Port `443` is commonly used in Elastic Cloud. However, with self-managed deployments, your Elasticsearch might run on port `9200` or something entirely different.
For information on where to find agent logs, refer to our [FAQ](/elastic/docs-builder/docs/3028/troubleshoot/ingest/fleet/frequently-asked-questions#where-are-the-agent-logs).

### Elastic Agent is cited as `Healthy` but still has set up problems sending data to Elasticsearch

1. To confirm that the Elastic Agent is running and its status is `Healthy`, select the **Agents** tab.
   If you previously selected the **Collect agent logs** option, you can now look at the agent logs.
2. Click the agent name and then select the **Logs** tab.
   If there are no logs displayed, it suggests a communication problem between your host and Elasticsearch. The possible reason for this is that the port is already in use.
3. You can check the port usage using tools like Wireshark or netstat. On a POSIX system, you can run the following command:
   ```shell
   netstat -nat | grep :8220
   ```
   Any response data indicates that the port is in use. This could be correct or not if you had intended to uninstall the Fleet Server. In which case, re-check and continue.


### Elastic Agent is stuck in status `Updating`

A stuck Elastic Agent upgrade should be detected automatically, and you can [restart the upgrade](/elastic/docs-builder/docs/3028/reference/fleet/upgrade-elastic-agent#restart-upgrade-single) from Fleet.

## Authentication and access


### Elasticsearch authentication service fails with `Authentication using apikey failed` message

To save API keys and encrypt them in Elasticsearch, Fleet requires an encryption key.
To provide an API key, in the [`kibana.yml`](https://www.elastic.co/elastic/docs-builder/docs/3028/deploy-manage/stack-settings) configuration file, set the `xpack.encryptedSavedObjects.encryptionKey` property.
```yaml
xpack.encryptedSavedObjects.encryptionKey: "something_at_least_32_characters"
```


### Elastic Agent fails with `Agent process is not root/admin or validation failed` message

Ensure the user running Elastic Agent has root privileges as some integrations require root privileges to collect sensitive data.
If you’re running Elastic Agent in the foreground (and not as a service) on Linux or macOS, run the agent under the root user: `sudo` or `su`.
If you’re using the Elastic Defend integration, make sure you’re running Elastic Agent under the SYSTEM account.
<tip>
  If you install Elastic Agent as a service as described in [*Install Elastic Agents*](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/install-elastic-agents), Elastic Agent runs under the SYSTEM account by default.
</tip>

To run Elastic Agent under the SYSTEM account, you can do the following:
1. Download [PsExec](https://docs.microsoft.com/en-us/sysinternals/downloads/psexec) and extract the contents to a folder. For example, `d:\tools`.
2. Open a command prompt as an Administrator (right-click the command prompt icon and select **Run As Administrator**).
3. From the command prompt, run Elastic Agent under the SYSTEM account:
   ```sh
   d:\tools\psexec.exe -sid "C:\Program Files\Elastic-Agent\elastic-agent.exe" run
   ```


### API key is unauthorized to send telemetry to `.logs-endpoint.diagnostic.collection-*` indices

By default, telemetry is turned on in the Elastic Stack to helps us learn about the features that our users are most interested in. This helps us to focus our efforts on making features even better.
If you’ve recently upgraded from version `7.10` to `7.11`, you might see the following message when you view Elastic Defend logs:
```sh
action [indices:admin/auto_create] is unauthorized for API key id [KbvCi3YB96EBa6C9k2Cm]
of user [fleet_enroll] on indices [.logs-endpoint.diagnostic.collection-default]
```

The above message indicates that Elastic Endpoint does not have the correct permissions to send telemetry. This is a known problem in 7.11 that will be fixed in an upcoming patch release.
To remove this message from your logs, you can turn off telemetry for the Elastic Defend integration until the next patch release is available.
1. In Kibana, click **Integrations**, and then select the **Manage** tab.
2. Click **Elastic Defend**, and then select the **Policies** tab to view all the installed integrations.
3. Click the integration to edit it.
4. Under advanced settings, set `windows.advanced.diagnostic.enabled` to `false`, and then save the integration.


### Error when running Elastic Agent commands with `sudo`

On Linux systems, when you install Elastic Agent [without administrative privileges](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/elastic-agent-unprivileged), that is, using the `--unprivileged` flag, Elastic Agent commands should not be run with `sudo`. Doing so can result in an error due to the agent not having the required privileges.
For example, when you run Elastic Agent with the `--unprivileged` flag, running the `elastic-agent inspect` command will result in an error like the following:
```sh
Error: error loading agent config: error loading raw config: fail to read configuration /Library/Elastic/Agent/fleet.enc for the elastic-agent: fail to decode bytes: cipher: message authentication failed
```

To resolve this, either install Elastic Agent without the `--unprivileged` flag so that it has administrative access, or run the Elastic Agent commands without the `sudo` prefix.

## Fleet Server and Elastic Agent


### On Fleet Server startup, ERROR seen with `State changed to CRASHED: exited with code: 1`

You might get this error message for a number of different reasons. A common reason is when attempting production-like usage and the ca.crt file passed in cannot be found.  To verify if this is the problem, bootstrap Fleet Server without passing a ca.crt file. This implies you would test any subsequent Elastic Agent installs temporarily with Fleet Server's own self-signed cert.
<tip>
  Ensure to pass in the full path to the ca.crt file. A relative path is not viable.
</tip>

You will know if your Fleet Server is set up with its testing oriented self-signed certificate usage, when you see the following error during Elastic Agent installs:
```sh
Error: fail to enroll: fail to execute request to fleet-server: x509: certificate signed by unknown authority
Error: enroll command failed with exit code: 1
```

To install or enroll against a self-signed cert Fleet Server Elastic Agent, add in the `--insecure` option to the command:
```sh
sudo ./elastic-agent install --url=https://<fleet-server-ip>:8220 --enrollment-token=<token> --insecure
```

For more information, refer to [Elastic Agent enrollment fails on the host with `x509: certificate signed by unknown authority` message](#agent-enrollment-certs).

### Fleet Server is running and healthy with data, but other Agents cannot use it to connect to Elasticsearch

Some settings are only used when you have multiple Elastic Agents.  If this is the case, check to be sure that the hosts can communicate with the Fleet Server.
From the non-Fleet Server host, run the following command:
```shell
curl -f http://<fleet-server-ip>:8220/api/status
```

The response might yield errors that you can debug further, or it might work and show that communication ports and networking are not the problems.
One common problem is that the default Fleet Server port of `8220` isn’t open on the Fleet Server host to communicate. You can review and correct this using common tools in alignment with any networking and security concerns that you have.

## Elastic Agent and integrations


### Integration policy upgrade has too many conflicts

If you try to upgrade an integration policy that is several versions old, there might be substantial conflicts or configuration issues. You might save time by creating a new policy, testing it, and rolling out the integration upgrade to additional hosts rather than trying to fix these problems.
After [upgrading the integration](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/upgrade-integration):
1. [Create a new policy](/elastic/docs-builder/docs/3028/reference/fleet/agent-policy#create-a-policy).
2. [Add the integration to the policy](/elastic/docs-builder/docs/3028/reference/fleet/agent-policy#add-integration). The later version is automatically used.
3. [Apply the policy](/elastic/docs-builder/docs/3028/reference/fleet/agent-policy#apply-a-policy) to an Elastic Agent.
   <tip>
   In larger deployments, you should test integration upgrades on a sample Elastic Agent before rolling out a larger upgrade initiative. Only after a small trial is deemed successful should the updated policy be rolled out all hosts.
   </tip>
4. Roll out the integration update to additional hosts:
   1. In Fleet, click **Agent policies**. Click on the name of the policy you want to edit.
2. Search or scroll to a specific integration. Open the **Actions** menu and select **Delete integration**.
3. Click **Add integration** and re-add the freshly deleted integration. The updated version will be used and applied to all Elastic Agents.
4. Repeat this process for each policy with the out-of-date integration.
   <note>
   In some instances, for example, when there are hundreds or thousands of different Elastic Agents and policies that need to be updated, this upgrade path is not feasible. In this case, update one policy and use the [Copy a policy](/elastic/docs-builder/docs/3028/reference/fleet/agent-policy#copy-policy) action to apply the updated policy versions to additional policies. This method’s downside is losing the granularity of assessing the individual Integration version changes individually across policies.
   </note>


### Elastic Agents are unable to connect after removing the Fleet Server integration

When you use Fleet-managed Elastic Agent, at least one Elastic Agent needs to be running the [Fleet Server integration](https://docs.elastic.co/integrations/fleet_server). In case the policy containing this integration is accidentally removed from Elastic Agent, all other agents will not be able to be managed. However, the Elastic Agents will continue to send data to their configured output.
There are two approaches to fixing this issue, depending on whether or not the the Elastic Agent that was running the Fleet Server integration is still installed and healthy (but is now running another policy).
To recover the Elastic Agent:
1. In Fleet, open the **Agents** tab and click **Add agent**.
2. In the **Add agent** flyout, select an agent policy that contains the **Fleet Server** integration. On Elastic Cloud you can use the **Elastic Cloud agent policy** which includes the integration.
3. Follow the instructions in the flyout, and stop before running the CLI commands.
4. Depending on the state of the original Fleet Server Elastic Agent, do one of the following:
   - **The original Fleet Server Elastic Agent is still running and healthy**
  In this case, you only need to re-enroll the agent with Fleet:
  1. Copy the `elastic-agent install` command from the Kibana UI.
2. In the command, replace `install` with `enroll`.
3. In the directory where Elastic Agent is running (for example `/opt/Elastic/Agent/` on Linux), run the command as `root`.
   For example, if Kibana gives you the command:
   ```sh
   sudo ./elastic-agent install --url=https://fleet-server:8220 --enrollment-token=bXktc3VwZXItc2VjcmV0LWVucm9sbWVudC10b2tlbg==
   ```
   Instead run:
   ```sh
   sudo ./elastic-agent enroll --url=https://fleet-server:8220 --enrollment-token=bXktc3VwZXItc2VjcmV0LWVucm9sbWVudC10b2tlbg==
   ```
- **The original Fleet Server Elastic Agent is no longer installed**
  In this case, you need to install the agent again:
  1. Copy the commands from the Kibana UI. The commands don’t need to be changed.
2. Run the commands in order. The first three commands will download a new Elastic Agent install package, expand the archive, and change directories.
   The final command will install Elastic Agent. For example:
   ```sh
   sudo ./elastic-agent install --url=https://fleet-server:8220 --enrollment-token=bXktc3VwZXItc2VjcmV0LWVucm9sbWVudC10b2tlbg==
   ```

After running these steps your Elastic Agents should be able to connect with Fleet again.

### illegal_argument_exception when TSDB is enabled

When you use an Elastic Agent integration in which TSDB (Time Series Database) is enabled, you might encounter an `illegal_argument_exception` error in the Fleet UI.
This can occur if you have a component template defined that includes a `_source` attribute, which conflicts with the `_source: synthetic` setting used when TSDB is enabled.
For details about the error and how to resolve it, refer to the section `Runtime fields cannot be used in TSDB indices` in the Innovation Hub article [TSDB enabled integrations for Elastic Agent](https://support.elastic.co/knowledge/9363b9fd).

### The `/api/fleet/setup` endpoint can’t reach the package registry to install Integrations

To install Integrations, the Fleet app requires a connection to an external service called the Elastic Package Registry.
For this to work, the Kibana server must connect to `https://epr.elastic.co` on port `443`.

## Elastic Cloud and Kibana


### Fleet in Kibana crashes

1. To investigate the error, open your browser’s development console.
2. Select the **Network** tab, and refresh the page.
   One of the requests to the Fleet API will most likely have returned an error. If the error message doesn’t give you enough information to fix the problem, contact us in the [discuss forum](https://discuss.elastic.co/).


### Hosted Elastic Agent is offline

To scale the Fleet Server deployment, Elastic Cloud starts new containers or shuts down old ones when hosted Elastic Agents are required or no longer needed. The old Elastic Agents will show in the Agents list for 24 hours then automatically disappear.

### Elastic Agents hosted on Elastic Cloud are stuck in `Updating` or `Offline`

In Elastic Cloud, after [upgrading](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/upgrade-integration) Fleet Server and its integration policies, agents enrolled in the Elastic Cloud agent policy might experience issues updating. To resolve this problem:
1. In a terminal window, run this `cURL` request, providing your Kibana superuser credentials to reset the Elastic Cloud agent policy:
   ```shell
   curl -u <username>:<password> --request POST \
     --url <kibana_url>/internal/fleet/reset_preconfigured_agent_policies/policy-elastic-agent-on-cloud \
     --header 'content-type: application/json' \
     --header 'kbn-xsrf: xyz' \
     --header 'elastic-api-version: 1'
   ```
2. Force unenroll the agent stuck in `Updating`:
   1. To find agent’s ID, go to **Fleet > Agents** and click the agent to see its details. Copy the Agent ID.
2. In a terminal window, run:
   ```shell
   curl -u <username>:<password> --request POST \
     --url <kibana_url>/api/fleet/agents/<agentID>/unenroll \
     --header 'content-type: application/json' \
     --header 'kbn-xsrf: xx' \
     --data-raw '{"force":true,"revoke":true}' \
     --compressed
   ```
   Where `<agentID>` is the ID you copied in the previous step.
3. Restart the Integrations Server:
   In the Elastic Cloud console under Integrations Server, click **Force Restart**.


### When using Elastic Cloud, Fleet Server is not listed in Kibana

If Fleet Server does not appear in Kibana, make sure that it’s set up.
To set up Fleet Server on Elastic Cloud:
1. Go to your deployment on Elastic Cloud.
2. Follow the Elastic Cloud prompts to set up **Integrations Server**. Once complete, the Fleet Server Elastic Agent will show up in Fleet.

To enable Fleet and set up Fleet Server on a self-managed cluster:
1. In the Elasticsearch configuration file, [`config/elasticsearch.yml`](https://www.elastic.co/elastic/docs-builder/docs/3028/deploy-manage/stack-settings), set the following security settings to enable security and API keys:
   ```yaml
   xpack.security.enabled: true
   xpack.security.authc.api_key.enabled: true
   ```
2. In the Kibana configuration file, [`config/kibana.yml`](https://www.elastic.co/elastic/docs-builder/docs/3028/deploy-manage/stack-settings), enable Fleet and specify your user credentials:
   ```yaml
   xpack.encryptedSavedObjects.encryptionKey: "something_at_least_32_characters"
   elasticsearch.username: "my_username" 
   elasticsearch.password: "my_password"
   ```
   To set up passwords, you can use the documented Elasticsearch APIs or the `elasticsearch-setup-passwords` command. For example, `./bin/elasticsearch-setup-passwords auto`
   After running the command:
   1. Copy the Elastic user name to the Kibana configuration file.
2. Restart Kibana.
3. Follow the documented steps for setting up a self-managed Fleet Server. For more information, refer to [What is Fleet Server?](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/fleet-server).


## Elastic Agent on Kubernetes


### Elastic Agent Out of Memory errors on Kubernetes

In a Kubernetes environment, Elastic Agent might be stopped with reason `OOMKilled` due to inadequate available memory.
To detect the problem, run the `kubectl describe pod` command and check the results for the following content:
```sh
       Last State:   Terminated
       Reason:       OOMKilled
       Exit Code:    137
```

To resolve the problem, allocate additional memory to the agent and then restart it.

### Troubleshoot Elastic Agent installation on Kubernetes, with Kustomize

Potential issues during Elastic Agent installation on Kubernetes can be categorized into two main areas:
- [Problems related to the creation of objects within the manifest](#agent-kustomize-manifest).
- [Failures occurring within specific components after installation](#agent-kustomize-after).


#### Problems related to the creation of objects within the manifest

When troubleshooting installations performed with [Kustomize](https://github.com/kubernetes-sigs/kustomize), it’s good practice to inspect the output of the rendered manifest. To do this, take the installation command provided by Kibana Onboarding and replace the final part, `| kubectl apply -f-`, with a redirection to a local file. This allows for easier analysis of the rendered output.
For example, the following command, originally provided by Kibana for an Elastic Agent Standalone installation, has been modified to redirect the output for troubleshooting purposes:
```sh
kubectl kustomize https://github.com/elastic/elastic-agent/deploy/kubernetes/elastic-agent-kustomize/default/elastic-agent-standalone\?ref\=v8.15.3 | sed -e 's/JUFQSV9LRVkl/ZDAyNnZaSUJ3eWIwSUlCT0duRGs6Q1JfYmJoVFRUQktoN2dXTkd0FNMtdw==/g' -e "s/%ES_HOST%/https:\/\/7a912e8674a34086eacd0e3d615e6048.us-west2.gcp.elastic-cloud.com:443/g" -e "s/%ONBOARDING_ID%/db687358-2c1f-4ec9-86e0-8f1baa4912ed/g" -e "s/\(docker.elastic.co\/beats\/elastic-agent:\).*$/\18.15.3/g" -e "/{CA_TRUSTED}/c\ " > elastic_agent_installation_complete_manifest.yaml
```

The previous command generates a local file named `elastic_agent_installation_complete_manifest.yaml`, which you can use for further analysis. It contains the complete set of resources required for the Elastic Agent installation, including:
- RBAC objects (`ServiceAccounts`, `Roles`, etc.)
- `ConfigMaps` and `Secrets` for Elastic Agent configuration
- Elastic Agent Standalone deployed as a `DaemonSet`
- [Kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) deployed as a `Deployment`

The content of this file is equivalent to what you’d obtain by following the [Run Elastic Agent Standalone on Kubernetes](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/running-on-kubernetes-standalone) steps, with the exception that `kube-state-metrics` is not included in the standalone method.
**Possible issues**
- If your user doesn’t have **cluster-admin** privileges, the RBAC resources creation might fail.
- Some Kubernetes security mechanisms (like [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/)) could cause part of the manifest to be rejected, as `hostNetwork` access and `hostPath` volumes are required.
- If you already have an installation of `kube-state-metrics`, it could cause part of the manifest installation to fail or to update your existing resources without notice.


#### Failures occurring within specific components after installation

If the installation is correct and all resources are deployed, but data is not flowing as expected (for example, you don’t see any data on the **[Metrics Kubernetes] Cluster Overview** dashboard), check the following items:
1. Check resources status and ensure they are all in a `Running` state:
   ```sh
   kubectl get pods -n kube-system | grep elastic
   kubectl get pods -n kube-system | grep kube-state-metrics
   ```
   <note>
   The default configuration assumes that both `kube-state-metrics` and the Elastic Agent `DaemonSet` are deployed in the **same namespace** for communication purposes. If you change the namespace of any of the components, the agent configuration will need further policy updates.
   </note>
2. Describe the Pods if they are in a `Pending` state:
   ```sh
   kubectl describe -n kube-system <name_of_elastic_agent_pod>
   ```
3. Check the logs of elastic-agents and kube-state-metrics, and look for errors or warnings:
   ```sh
   kubectl logs -n kube-system <name_of_elastic_agent_pod>
   kubectl logs -n kube-system <name_of_elastic_agent_pod> | grep -i error
   kubectl logs -n kube-system <name_of_elastic_agent_pod> | grep -i warn
   ```
   ```sh
   kubectl logs -n kube-system <name_of_kube-state-metrics_pod>
   ```

**Possible issues**
- Connectivity, authorization, or authentication issues when connecting to Elasticsearch:
  Ensure the API Key and Elasticsearch destination endpoint used during the installation is correct and is reachable from within the Pods.
  In an already installed system, the API Key is stored in a `Secret` named `elastic-agent-creds-<hash>`, and the endpoint is configured in the `ConfigMap` `elastic-agent-configs-<hash>`.
- Missing cluster-level metrics (provided by `kube-state-metrics`):
  As described in [Run Elastic Agent Standalone on Kubernetes](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/running-on-kubernetes-standalone), the Elastic Agent Pod acting as `leader` is responsible for retrieving cluster-level metrics from `kube-state-metrics` and delivering them to [data streams](https://www.elastic.co/elastic/docs-builder/docs/3028/manage-data/data-store/data-streams) prefixed as `metrics-kubernetes.state_<resource>`. In order to troubleshoot a situation where these metrics are not appearing:
  1. Determine which Pod owns the [leadership](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/kubernetes_leaderelection-provider) `lease` in the cluster, with:
   ```sh
   kubectl get lease -n kube-system elastic-agent-cluster-leader
   ```
2. Check the logs of that Pod to see if there are errors when connecting to `kube-state-metrics` and if the `state_*` metrics are being sent to Elasticsearch.
   One way to check if `state_*` metrics are being delivered to Elasticsearch is to inspect log lines with the `"Non-zero metrics in the last 30s"` message and check the values of the `state_*` metrics within the line, with something like:
   ```sh
   kubectl logs -n kube-system elastic-agent-xxxx | grep "Non-zero metrics" | grep "state_"
   ```
   If the previous command returns `"state_pod":{"events":213,"success":213}` or similar for all `state_*` metrics, it means the metrics are being delivered.
3. As a last resort, if you believe none of the Pods is acting as a leader, you can try deleting the `lease` to generate a new one:
   ```sh
   kubectl delete lease -n kube-system elastic-agent-cluster-leader
   # wait a few seconds and check for the lease again
   kubectl get lease -n kube-system elastic-agent-cluster-leader
   ```
- Performance problems:
  Monitor the CPU and Memory usage of the agents Pods and adjust the manifest requests and limits as needed. Refer to [Scaling Elastic Agent on Kubernetes](https://www.elastic.co/elastic/docs-builder/docs/3028/reference/fleet/scaling-on-kubernetes) for more details about the needed resources.

Extra resources for Elastic Agent on Kubernetes troubleshooting and information:
- [Elastic Agent Out of Memory errors on Kubernetes](#agent-oom-k8s).
- [Elastic Agent Kustomize Templates](https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-kustomize/default) documentation and resources.
- Other examples and manifests to deploy [Elastic Agent on Kubernetes](https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes).


### Troubleshoot Elastic Agent on Kubernetes seeing `invalid api key to authenticate with fleet` in logs

If an agent was unenrolled from a Kubernetes cluster, there might be data remaining in `/var/lib/elastic-agent-managed/kube-system/state` on the node(s). Reenrolling an agent later on the same nodes might then result in `invalid api key to authenticate with fleet` error messages.
To avoid these errors, make sure to delete this state-folder before enrolling a new agent.
For more information, refer to issue [#3586](https://github.com/elastic/elastic-agent/issues/3586).

## Air-gapped environments


### Kibana cannot connect to Elastic Package Registry in air-gapped environments

In air-gapped environments, you might encounter an error if you’re using a custom Certificate Authority (CA) that is not available to Kibana:
```json
{"type":"log","@timestamp":"2022-03-02T09:58:36-05:00","tags":["error","plugins","fleet"],"pid":58716,"message":"Error connecting to package registry: request to https://customer.server.name:8443/categories?experimental=true&include_policy_templates=true&kibana.version=7.17.0 failed, reason: self signed certificate in certificate chain"}
```

To fix this problem, add your CA certificate file path to the Kibana startup file by defining the `NODE_EXTRA_CA_CERTS` environment variable. More information about this in [TLS configuration of the Elastic Package Registry](/elastic/docs-builder/docs/3028/reference/fleet/air-gapped#air-gapped-tls) section.