Skip to content

Latest commit

 

History

History
90 lines (76 loc) · 2.15 KB

testing.asciidoc

File metadata and controls

90 lines (76 loc) · 2.15 KB

Testing analyzers

The analyze API is an invaluable tool for viewing the terms produced by an analyzer. A built-in analyzer (or combination of built-in tokenizer, token filters, and character filters) can be specified inline in the request:

POST _analyze
{
  "analyzer": "whitespace",
  "text":     "The quick brown fox."
}

POST _analyze
{
  "tokenizer": "standard",
  "filter":  [ "lowercase", "asciifolding" ],
  "text":      "Is this déja vu?"
}
Positions and character offsets

As can be seen from the output of the analyze API, analyzers not only convert words into terms, they also record the order or relative positions of each term (used for phrase queries or word proximity queries), and the start and end character offsets of each term in the original text (used for highlighting search snippets).

Alternatively, a custom analyzer can be referred to when running the analyze API on a specific index:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "std_folded": { (1)
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "my_text": {
          "type": "text",
          "analyzer": "std_folded" (2)
        }
      }
    }
  }
}

GET my_index/_analyze (3)
{
  "analyzer": "std_folded", (4)
  "text":     "Is this déjà vu?"
}

GET my_index/_analyze (3)
{
  "field": "my_text", (5)
  "text":  "Is this déjà vu?"
}
  1. Define a custom analyzer called std_folded.

  2. The field my_text uses the std_folded analyzer.

  3. To refer to this analyzer, the analyze API must specify the index name.

  4. Refer to the analyzer by name.

  5. Refer to the analyzer used by field my_text.