Loading

ES|QL and Pandas

The Elasticsearch Query Language (ES|QL) provides a powerful way to filter, transform, and analyze data stored in Elasticsearch. Designed to be easy to learn and use, it is a perfect fit for data scientists familiar with Pandas and other dataframe-based libraries. ES|QL queries produce tables with named columns, which is the definition of dataframes.

This page shows you an example of using ES|QL and Pandas together to work with dataframes.

Use the employees sample data and mapping. The easiest way to load this dataset is to run two Elasticsearch API requests in the Kibana Console.

Use the ES|QL CSV import to convert the employees dataset to a Pandas dataframe object.

from io import StringIO
from elasticsearch import Elasticsearch
import pandas as pd
client = Elasticsearch(
    "https://[host].elastic-cloud.com",
    api_key="...",
)
response = client.esql.query(
    query="FROM employees | LIMIT 500",
    format="csv",
)
df = pd.read_csv(StringIO(response.body))
print(df)

Even though the dataset contains only 100 records, a LIMIT of 500 is specified to suppress ES|QL warnings about potentially missing records. This prints the following dataframe:

    avg_worked_seconds  ...  salary_change.long still_hired
0            268728049  ...                   1        True
1            328922887  ...            [-7, 11]        True
2            200296405  ...            [12, 14]       False
3            311267831  ...       [0, 1, 3, 13]        True
4            244294991  ...            [-2, 13]        True
..                 ...  ...                 ...         ...
95           204381503  ...                 NaN       False
96           206258084  ...                  -1       False
97           272392146  ...          [-2, 4, 8]       False
98           377713748  ...    [-8, -3, 10, 14]        True
99           223910853  ...            [-7, 13]        True

You can now analyze the data with Pandas or you can also continue transforming the data using ES|QL.

In the next example, the STATS …​ BY command is utilized to count how many employees are speaking a given language. The results are sorted with the languages column using SORT:

response = client.esql.query(
    query="""
    FROM employees
    | STATS count = COUNT(emp_no) BY languages
    | SORT languages
    | LIMIT 500
    """,
    format="csv",
)
df = pd.read_csv(
    StringIO(response.body),
    dtype={"count": "Int64", "languages": "Int64"},
)
print(df)

Note that the dtype parameter of pd.read_csv() is useful when the type inferred by Pandas is not enough. The code prints the following response:

   count  languages
0     15          1
1     19          2
2     17          3
3     18          4
4     21          5

Use the built-in parameters support of the ES|QL REST API to pass parameters to a query:

response = client.esql.query(
    query="""
    FROM employees
    | STATS count = COUNT(emp_no) BY languages
    | WHERE languages >= (?)
    | SORT languages
    | LIMIT 500
    """,
    format="csv",
    params=[3],
)
df = pd.read_csv(
    StringIO(response.body),
    dtype={"count": "Int64", "languages": "Int64"},
)
print(df)

The code above outputs the following:

   count  languages
0     17          3
1     18          4
2     21          5

If you want to learn more about ES|QL, refer to the ES|QL documentation. You can also check out this other Python example using Boston Celtics data.