Token usage in Elastic Agent Builder

Serverless Elasticsearch Preview Serverless Observability Unavailable Serverless Security Unavailable Stack Preview 9.2.0

When using Elastic Agent Builder, total token usage typically exceeds the visible conversation text. Because Elastic Agent Builder utilizes an agentic framework, a single user request often triggers multiple model calls (rounds) to process reasoning steps, run tools, and interpret results.

Token counts include:

Input Tokens: These accumulate throughout the session. They include the user's current query, the conversation history from previous rounds, system prompts, and the results returned from any tools used during execution.
Output Tokens: These include the final response visible to the user, as well as all internal reasoning steps, tool calls, and intermediate results generated by the model.

Note

As the conversation history grows and the agent performs more complex reasoning loops, the input and output token count increases multiplicatively for each round of execution.

For more information on billing and token costs, refer to Elastic pricing.