-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using synonym_graph force elastic to double score the document #28982
Comments
Pinging @elastic/es-search-aggs |
Aby update around this issue? |
Yesterday I've found another bad unacceptable issue which it seems like above issue (synonym_graph). |
@romseygeek could you take a look at this? |
I think it's reasonable to use a max disjunction for multi-terms synonyms, currently the scores of the matching synonyms are simply added but we should select the max score. |
So is there any easy way to handle it in the current version (6.2.2)? |
No there is no workaround in the current version, we'll need a patch, first to select the best synonym score per document which as I said should be trivial to do and then work on a solution to produce similar scores for documents that match |
Hello, is the first step for the solution explained by jimczi will be implemented in elasticsearch in the current 7.X version? Currently, the scoring with multi-words synonyms it a bit hard to work with. |
Hello @jimczi, Is there any space in your roadmap for this improvement ? Thanks |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Any update on this? This bug, along with the fact that multi-word synonyms cannot be phrase-matched in a multimatch query when other terms are also included in the query makes them essentially impossible to use. |
There is no update @skaindl and no estimation as to when this will be fully addressed. |
Elasticsearch version: 6.2.2, Build: 10b1edd/2018-02-16T19:01:30.685723Z
Plugins installed: []
JVM version: 1.8.0_144
OS version: Ubuntu, Linux 4.4.0-104-generic
When I use
synonym_graph
in search time analyzer, some words which has more than one segments for example coffee shop treated as two words and make score double!I defined
coffee shop
as a synonym ofcafe
, then when I search for cafe all documents which has coffee shop in their titles have greater scores than same documents which have cafe in their titles (about 2 times greater).I've used Explain Api and found these scores returned by elastic:
For a document with coffee shop in its title, sum of:
59.249336 weight(search:coffee in 9429) [PerFieldSimilarity]
63.80951 weight(search:shop in 9429) [PerFieldSimilarity]
And for another document with cafe in its title:
34.8931 weight(search:cafe in 4409) [PerFieldSimilarity]
Is this a bug in synonym_graph or I had a mistake?
PS: all other keywords for these two documents are same.
The text was updated successfully, but these errors were encountered: