Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

Commit aeed981

Browse files
Added snippets for 120_Proximity_Matching
1 parent 762111f commit aeed981

15 files changed

+388
-10
lines changed

120_Proximity_Matching/05_Phrase_matching.asciidoc

+3
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ GET /my_index/my_type/_search
1616
}
1717
}
1818
--------------------------------------------------
19+
// SENSE: 120_Proximity_Matching/05_Match_phrase_query.json
1920

2021
Like the `match` query, the `match_phrase` query first analyzes the query
2122
string to produce a list of terms. It then searches for all the terms, but
@@ -38,6 +39,7 @@ The `match_phrase` query can also be written as a `match` query with type
3839
}
3940
}
4041
--------------------------------------------------
42+
// SENSE: 120_Proximity_Matching/05_Match_phrase_query.json
4143
4244
****
4345

@@ -51,6 +53,7 @@ also the _position_ or order of each term in the original string:
5153
GET /_analyze?analyzer=standard
5254
Quick brown fox
5355
--------------------------------------------------
56+
// SENSE: 120_Proximity_Matching/05_Term_positions.json
5457

5558
This returns:
5659

120_Proximity_Matching/10_Slop.asciidoc

+1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ GET /my_index/my_type/_search
2020
}
2121
}
2222
--------------------------------------------------
23+
// SENSE: 120_Proximity_Matching/10_Slop.json
2324

2425
The `slop` parameter tells the `match_phrase` query how far apart terms are
2526
allowed to be while still considering the document a match. By ``how far

120_Proximity_Matching/15_Multi_value_fields.asciidoc

+4
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ PUT /my_index/groups/1
1010
"names": [ "John Abraham", "Lincoln Smith"]
1111
}
1212
--------------------------------------------------
13+
// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json
1314

1415
Then run a phrase query for `"Abraham Lincoln"`:
1516

@@ -24,6 +25,7 @@ GET /my_index/groups/_search
2425
}
2526
}
2627
--------------------------------------------------
28+
// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json
2729

2830
Surprisingly our document matches, even though `"Abraham"` and `"Lincoln"`
2931
belong to two different people in the `names` array. The reason for this comes
@@ -61,6 +63,8 @@ PUT /my_index/_mapping/groups <2>
6163
}
6264
}
6365
--------------------------------------------------
66+
// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json
67+
6468
<1> First delete the `group` mapping and and documents of that type.
6569
<2> Then create a new `group` mapping with the correct values.
6670

120_Proximity_Matching/20_Scoring.asciidoc

+6-3
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ POST /my_index/my_type/_search
2525
}
2626
}
2727
--------------------------------------------------
28+
// SENSE: 120_Proximity_Matching/20_Scoring.json
29+
2830
<1> Note the high `slop` value
2931

3032
[source,js]
@@ -33,19 +35,20 @@ POST /my_index/my_type/_search
3335
"hits": [
3436
{
3537
"_id": "3",
36-
"_score": 0.75,
38+
"_score": 0.75, <1>
3739
"_source": {
3840
"title": "The quick brown fox jumps over the quick dog"
3941
}
4042
},
4143
{
4244
"_id": "2",
43-
"_score": 0.28347334,
45+
"_score": 0.28347334, <2>
4446
"_source": {
4547
"title": "The quick brown fox jumps over the lazy dog"
4648
}
4749
}
4850
]
4951
}
5052
--------------------------------------------------
51-
53+
<1> Higher score because `quick` and `dog` are close together.
54+
<2> Lower score because `quick` and `dog` are further apart.

120_Proximity_Matching/25_Relevance.asciidoc

+6-4
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ that we should combine them using the `bool` query.
1616

1717
We can use a simple `match` query as a `must` clause. This is the query that
1818
will determine which documents are included in our resultset -- we can trim
19-
the long tail with the `minimum_must_match` parameter. Then we can add other
19+
the long tail with the `minimum_should_match` parameter. Then we can add other
2020
more specific queries as `should` clauses -- every one that matches will
2121
increase the relevance of the matching docs.
2222

@@ -29,13 +29,13 @@ GET /my_index/my_type/_search
2929
"must": {
3030
"match": { <1>
3131
"title": {
32-
"query": "quick brown fox",
33-
"minimum_must_match": "30%"
32+
"query": "quick brown fox",
33+
"minimum_should_match": "30%"
3434
}
3535
}
3636
},
3737
"should": {
38-
"match_phrase": <2>
38+
"match_phrase": { <2>
3939
"title": {
4040
"query": "quick brown fox",
4141
"slop": 50
@@ -46,6 +46,8 @@ GET /my_index/my_type/_search
4646
}
4747
}
4848
--------------------------------------------------
49+
// SENSE: 120_Proximity_Matching/25_Relevance.json
50+
4951
<1> The `must` clause includes or excludes documents from the resultset.
5052
<2> The `should` clause increases the relevance score of those documents that
5153
match.

120_Proximity_Matching/30_Performance.asciidoc

+5-3
Original file line numberDiff line numberDiff line change
@@ -58,16 +58,16 @@ GET /my_index/my_type/_search
5858
"query": {
5959
"match": { <1>
6060
"title": {
61-
"query": "quick brown fox",
62-
"minimum_must_match": "30%"
61+
"query": "quick brown fox",
62+
"minimum_should_match": "30%"
6363
}
6464
}
6565
},
6666
"rescore": {
6767
"window_size": 50, <2>
6868
"query": { <3>
6969
"rescore_query": {
70-
"match_phrase":
70+
"match_phrase": {
7171
"title": {
7272
"query": "quick brown fox",
7373
"slop": 50
@@ -78,6 +78,8 @@ GET /my_index/my_type/_search
7878
}
7979
}
8080
--------------------------------------------------
81+
// SENSE: 120_Proximity_Matching/30_Performance.json
82+
8183
<1> The `match` query decides which results will be included in the final
8284
result set and ranks results according to TF/IDF.
8385
<2> The `window_size` is the number of top results to rescore, per shard.

120_Proximity_Matching/35_Shingles.asciidoc

+1
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ PUT /my_index
9292
}
9393
}
9494
--------------------------------------------------
95+
// SENSE: 120_Proximity_Matching/35_Shingles.json
9596

9697
<1> See <<relevance-is-broken>>.
9798
<2> The default min/max shingle size is `2` so we don't really need to set
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Delete the `my_index` index
2+
DELETE /my_index
3+
4+
# Create `my_index` with a single primary shard
5+
PUT /my_index
6+
{ "settings": { "number_of_shards": 1 }}
7+
8+
# Index some example docs
9+
POST /my_index/my_type/_bulk
10+
{ "index": { "_id": 1 }}
11+
{ "title": "The quick brown fox" }
12+
{ "index": { "_id": 2 }}
13+
{ "title": "The quick brown fox jumps over the lazy dog" }
14+
{ "index": { "_id": 3 }}
15+
{ "title": "The quick brown fox jumps over the quick dog" }
16+
{ "index": { "_id": 4 }}
17+
{ "title": "Brown fox brown dog" }
18+
19+
# match_phrase query
20+
GET /my_index/my_type/_search
21+
{
22+
"query": {
23+
"match_phrase": {
24+
"title": "quick brown fox"
25+
}
26+
}
27+
}
28+
29+
# match query, type phrase
30+
GET /my_index/my_type/_search
31+
{
32+
"query": {
33+
"match": {
34+
"title": {
35+
"type": "phrase",
36+
"query": "quick brown fox"
37+
}
38+
}
39+
}
40+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Term positions
2+
GET /_analyze?text=Quick brown fox
3+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Delete the `my_index` index
2+
DELETE /my_index
3+
4+
# Create `my_index` with a single primary shard
5+
PUT /my_index
6+
{ "settings": { "number_of_shards": 1 }}
7+
8+
# Index some example docs
9+
POST /my_index/my_type/_bulk
10+
{ "index": { "_id": 1 }}
11+
{ "title": "The quick brown fox" }
12+
{ "index": { "_id": 2 }}
13+
{ "title": "The quick brown fox jumps over the lazy dog" }
14+
{ "index": { "_id": 3 }}
15+
{ "title": "The quick brown fox jumps over the quick dog" }
16+
{ "index": { "_id": 4 }}
17+
{ "title": "Brown fox brown dog" }
18+
19+
20+
# Phrase query - doesn't match
21+
GET /my_index/my_type/_search
22+
{
23+
"query": {
24+
"match_phrase": {
25+
"title": {
26+
"query": "quick fox"
27+
}
28+
}
29+
}
30+
}
31+
32+
33+
# Proximity query with slop - matches
34+
GET /my_index/my_type/_search
35+
{
36+
"query": {
37+
"match_phrase": {
38+
"title": {
39+
"query": "quick fox",
40+
"slop": 1
41+
}
42+
}
43+
}
44+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Delete the `my_index` index
2+
DELETE /my_index
3+
4+
# Create `my_index` with a single primary shard
5+
PUT /my_index
6+
{ "settings": { "number_of_shards": 1 }}
7+
8+
# Index an example doc
9+
PUT /my_index/groups/1
10+
{
11+
"names": [
12+
"John Abraham",
13+
"Lincoln Smith"
14+
]
15+
}
16+
17+
# Phrase "Abraham Lincoln" matches!
18+
GET /my_index/groups/_search
19+
{
20+
"query": {
21+
"match_phrase": {
22+
"names": "Abraham Lincoln"
23+
}
24+
}
25+
}
26+
27+
# Delete `groups` mapping and data
28+
DELETE /my_index/groups/
29+
30+
# Map `names` to use position_offset_gap
31+
PUT /my_index/_mapping/groups
32+
{
33+
"properties": {
34+
"names": {
35+
"type": "string",
36+
"position_offset_gap": 100
37+
}
38+
}
39+
}
40+
41+
# Reindex document
42+
PUT /my_index/groups/1
43+
{
44+
"names": [
45+
"John Abraham",
46+
"Lincoln Smith"
47+
]
48+
}
49+
50+
# Phrase "Abraham Lincoln" no longer matches
51+
GET /my_index/groups/_search
52+
{
53+
"query": {
54+
"match_phrase": {
55+
"names": "Abraham Lincoln"
56+
}
57+
}
58+
}
59+
60+
# But phrase "John Abraham" does
61+
GET /my_index/groups/_search
62+
{
63+
"query": {
64+
"match_phrase": {
65+
"names": "John Abraham"
66+
}
67+
}
68+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Delete the `my_index` index
2+
DELETE /my_index
3+
4+
# Create `my_index` with a single primary shard
5+
PUT /my_index
6+
{ "settings": { "number_of_shards": 1 }}
7+
8+
# Index some example docs
9+
POST /my_index/my_type/_bulk
10+
{ "index": { "_id": 1 }}
11+
{ "title": "The quick brown fox" }
12+
{ "index": { "_id": 2 }}
13+
{ "title": "The quick brown fox jumps over the lazy dog" }
14+
{ "index": { "_id": 3 }}
15+
{ "title": "The quick brown fox jumps over the quick dog" }
16+
{ "index": { "_id": 4 }}
17+
{ "title": "Brown fox brown dog" }
18+
19+
# High slop value
20+
POST /my_index/my_type/_search
21+
{
22+
"query": {
23+
"match_phrase": {
24+
"title": {
25+
"query": "quick dog",
26+
"slop": 50
27+
}
28+
}
29+
}
30+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Delete the `my_index` index
2+
DELETE /my_index
3+
4+
# Create `my_index` with a single primary shard
5+
PUT /my_index
6+
{ "settings": { "number_of_shards": 1 }}
7+
8+
# Index some example docs
9+
POST /my_index/my_type/_bulk
10+
{ "index": { "_id": 1 }}
11+
{ "title": "The quick brown fox" }
12+
{ "index": { "_id": 2 }}
13+
{ "title": "The quick brown fox jumps over the lazy dog" }
14+
{ "index": { "_id": 3 }}
15+
{ "title": "The quick brown fox jumps over the quick dog" }
16+
{ "index": { "_id": 4 }}
17+
{ "title": "Brown fox brown dog" }
18+
19+
# Combine phrase with match query to boost relevance
20+
GET /my_index/my_type/_search
21+
{
22+
"query": {
23+
"bool": {
24+
"must": {
25+
"match": {
26+
"title": {
27+
"query": "quick brown fox",
28+
"minimum_should_match": "30%"
29+
}
30+
}
31+
},
32+
"should": {
33+
"match_phrase": {
34+
"title": {
35+
"query": "quick brown fox",
36+
"slop": 50
37+
}
38+
}
39+
}
40+
}
41+
}
42+
}

0 commit comments

Comments
 (0)