Loading

ES|QL INLINE STATS command

The INLINE STATS processing command groups rows according to a common value and calculates one or more aggregated values over the grouped rows. The results are appended as new columns to the input rows.

The command is identical to STATS except that it preserves all the columns from the input table.

INLINE STATS [column1 =] expression1 [WHERE boolean_expression1][,
      ...,
      [columnN =] expressionN [WHERE boolean_expressionN]]
      [BY [grouping_name1 =] grouping_expression1[,
          ...,
          [grouping_nameN = ] grouping_expressionN]]
		
columnX
The name by which the aggregated value is returned. If omitted, the name is equal to the corresponding expression (expressionX). If multiple columns have the same name, all but the rightmost column with this name will be ignored.
expressionX
An expression that computes an aggregated value.
grouping_expressionX
An expression that outputs the values to group by. If its name coincides with one of the existing or computed columns, that column will be overridden by this one.
boolean_expressionX
The condition that determines which rows are included when evaluating expressionX.
Note

Individual null values are skipped when computing aggregations.

The INLINE STATS processing command groups rows according to a common value (also known as the grouping key), specified after BY, and calculates one or more aggregated values over the grouped rows. The output table contains the same number of rows as the input table. The command only adds new columns or overrides existing columns with the same name as the result.

If column names overlap, existing column values may be overridden and column order may change. The new columns are added/moved so that they appear in the order they are defined in the INLINE STATS command.

For the calculation of each aggregated value, the rows in a group can be filtered with WHERE. If BY is omitted the aggregations are applied over the entire dataset.

The following aggregation functions are supported:

  • ABSENT

    Returns true if the input expression yields no non-null values.

  • AVG

    Returns the average of a numeric field.

  • COUNT

    Returns the total number of input values.

  • COUNT_DISTINCT

    Returns the approximate number of distinct values.

  • EARLIEST

    Returns the earliest value of a field sorted by timestamp.

  • FIRST

    Returns the earliest occurrence of a field based on a sort field.

  • LAST

    Returns the latest occurrence of a field based on a sort field.

  • LATEST

    Returns the latest value of a field sorted by timestamp.

  • MAX

    Returns the maximum value of a field.

  • MEDIAN

    Returns the median value of a numeric field.

  • MEDIAN_ABSOLUTE_DEVIATION

    Returns the median absolute deviation, a robust measure of variability.

  • MIN

    Returns the minimum value of a field.

  • PERCENTILE

    Returns the value at which a certain percentage of observed values occur.

  • PRESENT

    Returns true if the input expression yields any non-null values.

  • SAMPLE

    Collects sample values for a field.

  • SPARKLINE

    Computes y-axis values of a sparkline graph for an aggregation over time.

  • ST_CENTROID_AGG

    Calculates the spatial centroid over a field with geometry type.

  • ST_EXTENT_AGG

    Calculates the spatial extent bounding box over a field with geometry type.

  • STD_DEV

    Returns the population standard deviation of a numeric field.

  • SUM

    Returns the sum of a numeric expression.

  • TOP

    Collects the top values for a field, including repeated values.

  • VALUES

    Returns unique deduplicated values as a multivalued field.

  • VARIANCE

    Returns the population variance of a numeric field.

  • WEIGHTED_AVG

    Returns the weighted average of a numeric expression.

The following grouping functions are supported:

The following examples show common INLINE STATS patterns.

Calculate a statistic on one column and group by the values of another:

FROM employees
| KEEP emp_no, languages, salary
| INLINE STATS max_salary = MAX(salary) BY languages
		
emp_no:integer salary:integer max_salary:integer languages:integer
10001 57305 73578 2
10002 56371 66817 5
10003 61805 74572 4
10004 36174 66817 5
10005 63528 73717 1
Note

The languages column moves to the last position in the output table because it is a column overridden by the INLINE STATS command (it's the grouping key) and it is the last column defined by it.

Omit BY to apply aggregations over the entire dataset. The order of the existing columns is preserved and the new column is added last:

FROM employees
| KEEP emp_no, languages, salary
| INLINE STATS max_salary = MAX(salary)
		
emp_no:integer languages:integer salary:integer max_salary:integer
10001 2 57305 74999
10002 5 56371 74999
10003 4 61805 74999
10004 5 36174 74999
10005 1 63528 74999
FROM employees
| WHERE still_hired
| KEEP emp_no, languages, salary, hire_date
| EVAL tenure = DATE_DIFF("year", hire_date, "2025-09-18T00:00:00")
| DROP hire_date
| INLINE STATS avg_salary = AVG(salary), count = count(*) BY languages, tenure
		
emp_no:integer salary:integer avg_salary:double count:long languages:integer tenure:integer
10001 57305 51130.5 2 2 39
10002 56371 40180.0 3 5 39
10004 36174 30749.0 2 5 38
10005 63528 63528.0 1 1 36
10007 74572 58644.0 2 4 36
FROM employees
| KEEP emp_no, salary
| INLINE STATS avg_lt_50 = ROUND(AVG(salary)) WHERE salary < 50000,
               avg_lt_60 = ROUND(AVG(salary)) WHERE salary >=50000 AND salary < 60000,
               avg_gt_60 = ROUND(AVG(salary)) WHERE salary >= 60000
		
emp_no:integer salary:integer avg_lt_50:double avg_lt_60:double avg_gt_60:double
10001 57305 38292.0 54221.0 67286.0
10002 56371 38292.0 54221.0 67286.0
10003 61805 38292.0 54221.0 67286.0
10004 36174 38292.0 54221.0 67286.0
10005 63528 38292.0 54221.0 67286.0
  • The CATEGORIZE grouping function is not currently supported.
  • You cannot currently use LIMIT (explicit or implicit) before INLINE STATS, because this can lead to unexpected results.
  • You cannot currently use FORK before INLINE STATS, because FORK adds an implicit LIMIT to each branch, which can lead to unexpected results.