Index fundamentals
An index is the fundamental unit of storage in Elasticsearch, and the level at which you interact with your data. You can store many independent datasets side by side.
To store a document, you add it to a specific index. To search, you target one or more indices. Elasticsearch searches all data within them and returns any matching documents. You can target your data by index name, through an alias that points to one or more indices, or through a data stream that routes requests to the appropriate backing indices.
Behind the scenes, Elasticsearch divides each index into shards and distributes them across the nodes in your cluster. The horizontal scaling of your primary shard into replica shards across other nodes allows your index to handle large volumes of traffic efficiently. Replica shards provide fault tolerance, keeping your data available even when an individual node's response fails.
This page explains the core parts of an index (documents, mappings, and settings), describes how Elasticsearch physically stores index data using shards, and highlights common design decisions.
In Elastic Cloud Serverless:
- Shards, replicas, and nodes are fully managed for you. The platform automatically scales resources based on your workload, so you don't need to configure or monitor these details. The shard-related content on this page explains how Elasticsearch works under the hood.
- Each project supports up to 15,000 total indices. This limit helps ensure reliable performance and stability. If you need a higher limit, you can request an increase. For index sizing recommendations, refer to index sizing guidelines.
An index is made up of the following components:
- Documents: The JSON objects that hold your data, including system-managed metadata fields like
_indexand_id. - Mappings: Definitions that specify field data types, and control how data is indexed and queried. Understanding field data types helps you write effective queries and avoid indexing problems.
- Settings: Index-level configuration such as shard count, replica count, and refresh interval that controls storage and performance behavior.
Elasticsearch serializes and stores data in the form of JSON documents. A document is a set of fields, which are key-value pairs that contain your data. Each document has a unique ID, which you can specify explicitly or have Elasticsearch auto-generate. An indexed document includes both document fields you define and system-managed metadata.
A simple Elasticsearch document might look like this:
{
"_index": "my-first-elasticsearch-index",
"_id": "DyFpo5EBxE8fzbb95DOa",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"email": "john@smith.com",
"first_name": "John",
"last_name": "Smith",
"info": {
"bio": "Eco-warrior and defender of the weak",
"age": 25,
"interests": [
"dolphins",
"whales"
]
},
"join_date": "2024/05/01"
}
}
- Metadata fields are system-managed fields prefixed with an underscore.
_indexidentifies which index stores the document and_idis the document's unique identifier within that index. - The
_sourcefield contains the original document body as submitted. The fields inside_sourceare the ones you control through mappings. You can define these mappings explicitly or have Elasticsearch create them for you dynamically when your data is ingested.
Each index has a mapping that defines the data type for each field, how the field should be indexed, and how it should be stored.
For example, the following mapping defines field types for a few common data types:
{
"properties": {
"email": { "type": "keyword" },
"first_name": { "type": "text" },
"age": { "type": "integer" },
"join_date": { "type": "date" }
}
}
Each index has settings that control its storage and performance behavior. Settings are configured when the index is created, either directly in the create index request or through an index template. Some index settings can be updated dynamically on a live index.
Common settings include:
index.number_of_shards: The number of primary shards. Fixed at creation.index.number_of_replicas: The number of replica copies per primary shard. Can be changed at any time.index.refresh_interval: How often new data becomes searchable. Defaults to1s.
For the full list of available settings, refer to Index settings.
When you create an index, Elasticsearch doesn't store all its documents in a single location. Instead, it divides the index into one or more shards and distributes those shards across the nodes in your cluster. Each shard is a self-contained Apache Lucene index with practical limits on how much data it can efficiently manage, so splitting data across multiple shards keeps individual shards performant. Distributing those shards across cluster nodes adds horizontal scaling and redundancy. The right number of shards depends on your data volume, query patterns, and cluster topology — there is no single correct answer. Refer to shard sizing and distribution recommendations for more information and best practices.
You don't interact with shards directly when indexing or searching. Instead, you target the index by name and Elasticsearch routes the operation to the appropriate shards. However, the number and size of shards you configure affects performance and stability. Refer to Common index design decisions for more information.
A shard holds a subset of the index's documents and can independently handle indexing and search operations. Inside each shard, data is organized into immutable segments that are written as documents are indexed. To learn how segments affect search availability, refer to Near real-time search.
There are two types of shards:
- Primary shards: Every document belongs to exactly one primary shard. The number of primary shards is fixed at index creation, either through an index template or the
index.number_of_shardssetting in the create index request. - Replica shards: Copies of primary shards that provide redundancy and serve read requests. You can adjust the number of replicas at any time using the
index.number_of_replicassetting.
By distributing shards across multiple nodes, Elasticsearch can scale horizontally and continue operating even when individual nodes fail. For a detailed explanation of this distributed model, refer to Distributed architecture. To learn how Elasticsearch coordinates reads and writes across primary and replica shards, refer to Reading and writing documents.
Setting up your Elasticsearch indices involves making some design decisions about the index components: mappings control how the index fields are created for different data types, templates standardize the configuration of settings across indices, aliases decouple queries from the index names, and lifecycle policies automate how the data is stored over time.
When working with indices, you typically make decisions that focus on:
- Naming and aliases: Use clear naming patterns for your indices and aliases to simplify query targets and support index changes with minimal disruption.
- Mapping strategy: Use dynamic mapping for speed when exploring data, and explicit mappings for production use cases. Choosing the right field type up front matters because it controls what queries and aggregations are available, and changing a field type later requires reindexing.
- Index or data stream: Use a regular index when you need frequent updates or deletes. For append-only, time series data such as logs, events, and metrics, use a data stream instead, since data streams manage rolling indices automatically.
- Shard sizing: For production workloads, the number and size of shards affect query speed and cluster stability. Refer to Size your shards for guidelines.
- Data lifecycle: Decide how long to keep data, when to move it to cheaper tiers, and when to delete it. Refer to Data lifecycle for more information.
Now that you understand index fundamentals, explore these pages for hands-on tasks:
- Manage indices in Kibana: View, investigate, and perform operations on indices, data streams, and enrich policies in Kibana.
- Templates: Create and manage index templates and component templates.
- Manage a data stream in Kibana: Create, monitor, and manage data streams and their backing indices.
- Data enrichment: Set up enrich policies to add data from existing indices to incoming documents.
- Manage data using APIs: Index, update, retrieve, search, and delete documents using the Elasticsearch REST API.