﻿---
title: Anatomy of an analyzer
description: An analyzer  — whether built-in or custom — is a package which contains three lower-level building blocks: character filters, tokenizers, and token filters...
url: https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/data-store/text-analysis/anatomy-of-an-analyzer
products:
  - Elasticsearch
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Anatomy of an analyzer
An *analyzer*  — whether built-in or custom — is a package which contains three lower-level building blocks: *character filters*, *tokenizers*, and *token filters*.
The built-in [analyzers](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/text-analysis/analyzer-reference) pre-package these building blocks into analyzers suitable for different languages and types of text. Elasticsearch also exposes the individual building blocks so that they can be combined to define new [`custom`](https://www.elastic.co/elastic/docs-builder/docs/3016/manage-data/data-store/text-analysis/create-custom-analyzer) analyzers.

## Character filters

A *character filter* receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), or to strip HTML elements like `<b>` from the stream.
An analyzer may have **zero or more** [character filters](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/text-analysis/character-filter-reference), which are applied in order.

## Tokenizer

A *tokenizer* receives a stream of characters, breaks it up into individual *tokens* (usually individual words), and outputs a stream of *tokens*. For instance, a [`whitespace`](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-whitespace-tokenizer) tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text `"Quick brown fox!"` into the terms `[Quick, brown, fox!]`.
The tokenizer is also responsible for recording the order or *position* of each term and the start and end *character offsets* of the original word which the term represents.
An analyzer must have **exactly one** [tokenizer](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/text-analysis/tokenizer-reference).

## Token filters

A *token filter* receives the token stream and may add, remove, or change tokens. For example, a [`lowercase`](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-lowercase-tokenfilter) token filter converts all tokens to lowercase, a [`stop`](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-stop-tokenfilter) token filter removes common words (*stop words*) like `the` from the token stream, and a [`synonym`](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/text-analysis/analysis-synonym-tokenfilter) token filter introduces synonyms into the token stream.
Token filters are not allowed to change the position or character offsets of each token.
An analyzer may have **zero or more** [token filters](https://docs-v3-preview.elastic.dev/elastic/docs-builder/docs/3016/reference/text-analysis/token-filter-reference), which are applied in order.