Loading

Rank Vectors

Warning

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

The rank_vectors field type enables late-interaction dense vector scoring in Elasticsearch. The number of vectors per field can vary, but they must all share the same number of dimensions and element type.

The purpose of vectors stored in this field is second order ranking documents with max-sim similarity.

Here is a simple example of using this field with float elements.

 PUT my-rank-vectors-float {
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "rank_vectors"
      }
    }
  }
}

PUT my-rank-vectors-float/_doc/1
{
  "my_vector" : [[0.5, 10, 6], [-0.5, 10, 10]]
}

In addition to the float element type, byte and bit element types are also supported.

Here is an example of using this field with byte elements.

 PUT my-rank-vectors-byte {
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "rank_vectors",
        "element_type": "byte"
      }
    }
  }
}

PUT my-rank-vectors-byte/_doc/1
{
  "my_vector" : [[1, 2, 3], [4, 5, 6]]
}

Here is an example of using this field with bit elements.

 PUT my-rank-vectors-bit {
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "rank_vectors",
        "element_type": "bit"
      }
    }
  }
}

POST /my-rank-vectors-bit/_bulk?refresh
{"index": {"_id" : "1"}}
{"my_vector": [127, -127, 0, 1, 42]}
{"index": {"_id" : "2"}}
{"my_vector": "8100012a7f"}

The rank_vectors field type supports the following parameters:

element_type
(Optional, string) The data type used to encode vectors. The supported data types are float (default), byte, and bit.
dims
(Optional, integer) Number of vector dimensions. Can’t exceed 4096. If dims is not specified, it will be set to the length of the first vector added to the field.
Important

Synthetic _source is Generally Available only for TSDB indices (indices that have index.mode set to time_series). For other indices synthetic _source is in technical preview. Features in technical preview may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

rank_vectors fields support synthetic _source .

Rank vectors can be accessed and used in script_score queries.

For example, the following query scores documents based on the maxSim similarity between the query vector and the vectors stored in the my_vector field:

 GET my-rank-vectors-float/_search {
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "maxSimDotProduct(params.query_vector, 'my_vector')",
        "params": {
          "query_vector": [[0.5, 10, 6], [-0.5, 10, 10]]
        }
      }
    }
  }
}

Additionally, asymmetric similarity functions can be used to score against bit vectors. For example, the following query scores documents based on the maxSimDotProduct similarity between a floating point query vector and bit vectors stored in the my_vector field:

 PUT my-rank-vectors-bit {
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "rank_vectors",
        "element_type": "bit"
      }
    }
  }
}

POST /my-rank-vectors-bit/_bulk?refresh
{"index": {"_id" : "1"}}
{"my_vector": [127, -127, 0, 1, 42]}
{"index": {"_id" : "2"}}
{"my_vector": "8100012a7f"}

GET my-rank-vectors-bit/_search
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "maxSimDotProduct(params.query_vector, 'my_vector')",
        "params": {
          "query_vector": [
            [0.35, 0.77, 0.95, 0.15, 0.11, 0.08, 0.58, 0.06, 0.44, 0.52, 0.21,
       0.62, 0.65, 0.16, 0.64, 0.39, 0.93, 0.06, 0.93, 0.31, 0.92, 0.0,
       0.66, 0.86, 0.92, 0.03, 0.81, 0.31, 0.2 , 0.92, 0.95, 0.64, 0.19,
       0.26, 0.77, 0.64, 0.78, 0.32, 0.97, 0.84]
           ] 1
        }
      }
    }
  }
}
  1. Note that the query vector has 40 elements, matching the number of bits in the bit vectors.