Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting sort field via script_fields results in array_index_out_of_bounds_exception for some values #99620

Closed
mad-pf opened this issue Sep 18, 2023 · 5 comments
Labels
>bug feedback_needed priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@mad-pf
Copy link

mad-pf commented Sep 18, 2023

Elasticsearch Version

8.9.2

Installed Plugins

analysis-icu

Java Version

bundled

OS Version

as shipped in docker.elastic.co/elasticsearch/elasticsearch:8.9.2; Linux kernel 6.4

Problem Description

Getting the sort value of some simple text values (such as "Q" or "W") in a painless script results in an array_index_out_of_bounds_exception while the same request has no issues with other values such as "A" or "QQ".

Seems to work fine on ES 7.17, but is also broken on ES 8.5. analysis-icu plugin is required to reproduce the problem.

Steps to Reproduce

The following script creates a simple index, adds two objects and searches for these objects with script_fields. For the first object, it works fine, for the second object, an error is returned:

#!/bin/sh

ES=http://localhost:9202
ESIDX=debug_69726

curl -s -XDELETE $ES/$ESIDX >/dev/null

curl -s -XPUT $ES/$ESIDX?pretty=true -H 'Content-Type: application/json' -d '{
    "settings": {}
}'

curl -s -XPOST $ES/$ESIDX/_mapping?pretty=true -H 'Content-Type: application/json' -d '{
    "properties": {
        "name": {
            "type": "keyword",
            "fields": {
                "sort": {
                    "type": "icu_collation_keyword"
                },
                "text": {
                    "type": "text"
                }
            }
        }
    }
}'

curl -s -XPOST $ES/$ESIDX/_bulk?pretty=true\&refresh=true -H 'Content-Type: application/json' -d \
'{"index": {"_id": "foo:1"}}
{"name": "A"}
{"index": {"_id": "foo:2"}}
{"name": "Q"}
'

echo "foo:1, no problem"
curl -s -XPOST $ES/$ESIDX/_search?pretty=true -H 'Content-Type: application/json' -d '{
    "query": {"term": {"_id": "foo:1"}},
    "script_fields": {
        "_sort": {
            "script": {
                "lang": "painless",
                "source": "doc['\''name.sort'\''].value"
            }
        }
    }   
}'

echo "foo:2, explodes"
curl -s -XPOST $ES/$ESIDX/_search?pretty=true -H 'Content-Type: application/json' -d '{
    "query": {"term": {"_id": "foo:2"}},
    "script_fields": {
        "_sort": {
            "script": {
                "lang": "painless",
                "source": "doc['\''name.sort'\''].value"
            }
        }
    }   
}'

Result for the first object, no problem:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "debug_69726",
        "_id" : "foo:1",
        "_score" : 1.0,
        "fields" : {
          "_sort" : [
            "*\u0001\u0005\u0001܀"
          ]
        }
      }
    ]
  }
}

Result for the second object, the error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "org.apache.lucene.core@9.7.0/org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:649)",
          "org.apache.lucene.core@9.7.0/org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:136)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.bytesToString(ScriptDocValues.java:461)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:466)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:420)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)",
          "doc['name.sort'].value",
          "                ^---- HERE"
        ],
        "script" : "doc['name.sort'].value",
        "lang" : "painless",
        "position" : {
          "offset" : 16,
          "start" : 0,
          "end" : 22
        }
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "debug_69726",
        "node" : "PZ2dTLbxTWayGvJjxds3cg",
        "reason" : {
          "type" : "script_exception",
          "reason" : "runtime error",
          "script_stack" : [
            "org.apache.lucene.core@9.7.0/org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:649)",
            "org.apache.lucene.core@9.7.0/org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:136)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.bytesToString(ScriptDocValues.java:461)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:466)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:420)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)",
            "doc['name.sort'].value",
            "                ^---- HERE"
          ],
          "script" : "doc['name.sort'].value",
          "lang" : "painless",
          "position" : {
            "offset" : 16,
            "start" : 0,
            "end" : 22
          },
          "caused_by" : {
            "type" : "array_index_out_of_bounds_exception",
            "reason" : "Index 6 out of bounds for length 6"
          }
        }
      }
    ]
  },
  "status" : 400
}

Logs (if relevant)

No response

@mad-pf mad-pf added >bug needs:triage Requires assignment of a team area label labels Sep 18, 2023
@romseygeek romseygeek added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels Sep 19, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Sep 19, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@juntezhang
Copy link

juntezhang commented Oct 3, 2023

This also affects the sort when the field is multi-valued (array), as by default the sort mode is by max, so then this exception will be triggered when setting the field type to icu_collation_keyword.

juntezhang referenced this issue Nov 16, 2023
…pers (#99361)

We mostly have a handful of `FieldType` values here across all mappers and none of them contain
attributes. There's only so many combinations here, lets deduplicate these to save some heap and set up
subsequent mapper heap savings.
@benwtrent benwtrent added the priority:high A label for assessing bug priority to be used by ES engineers label Jul 9, 2024
@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine removed the Team:Search Meta label for search team label Jul 17, 2024
@javanna
Copy link
Member

javanna commented Nov 21, 2024

Hey @juntezhang sorry for the lag. Looking at your recreation, I wonder what the purpose of the script is. Can you just retrieve the field without a script? Perhaps I am missing something. Besides that, given that the behaviour you are observing is field type dependent, perhaps analysis chain dependent, it seems like some tokens are removed as part of text analysis. Could this be fixed by making your script more robust, by first checking that there are values for the field before retrieving them?

@javanna javanna added priority:normal A label for assessing bug priority to be used by ES engineers feedback_needed and removed priority:high A label for assessing bug priority to be used by ES engineers labels Nov 21, 2024
@javanna
Copy link
Member

javanna commented Mar 14, 2025

Closing for lack of feedback. Feel free to post more info and we will reopen as needed.

@javanna javanna closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug feedback_needed priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

6 participants