﻿---
title: Data modelling tips
description: Annotations are normally a way of weaving structured information into unstructured text for higher-precision search. Entity resolution is a form of document...
url: https://www.elastic.co/elastic/docs-builder/docs/3016/reference/elasticsearch/plugins/mapper-annotated-text-tips
products:
  - Elasticsearch
---

# Data modelling tips
## Use structured and unstructured fields

Annotations are normally a way of weaving structured information into unstructured text for higher-precision search.
`Entity resolution` is a form of document enrichment undertaken by specialist software or people where references to entities in a document are disambiguated by attaching a canonical ID. The ID is used to resolve any number of aliases or distinguish between people with the same name. The hyperlinks connecting Wikipedia’s articles are a good example of resolved entity IDs woven into text.
These IDs can be embedded as annotations in an annotated_text field but it often makes sense to include them in dedicated structured fields to support discovery via aggregations:
```json

{
  "mappings": {
    "properties": {
      "my_unstructured_text_field": {
        "type": "annotated_text"
      },
      "my_structured_people_field": {
        "type": "text",
        "fields": {
          "keyword" : {
            "type": "keyword"
          }
        }
      }
    }
  }
}
```

Applications would then typically provide content and discover it as follows:
```json
# Example documents

{
  "my_unstructured_text_field": "[Shay](%40kimchy) created elasticsearch",
  "my_twitter_handles": ["@kimchy"] <1>
}


{
  "query": {
    "query_string": {
        "query": "elasticsearch OR logstash OR kibana",<2>
        "default_field": "my_unstructured_text_field"
    }
  },
  "aggregations": {
  	"top_people" :{
  	    "significant_terms" : { <3>
	       "field" : "my_twitter_handles.keyword"
  	    }
  	}
  }
}
```


## Avoiding over-matching annotations

By design, the regular text tokens and the annotation tokens co-exist in the same indexed field but in rare cases this can lead to some over-matching.
The value of an annotation often denotes a *named entity* (a person, place or company). The tokens for these named entities are inserted untokenized, and differ from typical text tokens because they are normally:
- Mixed case e.g. `Madonna`
- Multiple words e.g. `Jeff Beck`
- Can have punctuation or numbers e.g. `Apple Inc.` or `@kimchy`

This means, for the most part, a search for a named entity in the annotated text field will not have any false positives e.g. when selecting `Apple Inc.` from an aggregation result you can drill down to highlight uses in the text without "over matching" on any text tokens like the word `apple` in this context:
```
the apple was very juicy
```

However, a problem arises if your named entity happens to be a single term and lower-case e.g. the company `elastic`. In this case, a search on the annotated text field for the token `elastic` may match a text document such as this:
```
they fired an elastic band
```

To avoid such false matches users should consider prefixing annotation values to ensure they don’t name clash with text tokens e.g.
```
[elastic](Company_elastic) released version 7.0 of the elastic stack today
```