Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

Commit 8f1ed17

Browse files
Added geolocation chapter
1 parent 0e79bfa commit 8f1ed17

22 files changed

+1606
-28
lines changed

010_Intro/45_Distributed.asciidoc

+3-2
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,9 @@ the operations happening automatically under the hood include:
3434
As you read through this book, you'll encounter supplemental chapters about the
3535
distributed nature of Elasticsearch. These chapters will teach you about
3636
how the cluster scales and deals with failover (<<distributed-cluster>>),
37-
handles document storage (<<distributed-docs>>) and executes distributed search
38-
(<<distributed-search>>).
37+
handles document storage (<<distributed-docs>>), executes distributed search
38+
(<<distributed-search>>), and what a shard is and how it works
39+
(<<inside-a-shard>>).
3940

4041
These chapters are not required reading -- you can use Elasticsearch without
4142
understanding these internals -- but they will provide insight that will make

300_Aggregations/110_docvalues.asciidoc

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
1+
[[doc-values]]
22
=== Doc Values
33

44
The default data structure for field data is called _paged-bytes_, and it is
@@ -11,13 +11,13 @@ There is an alternative format known as _doc values_. Doc values are special
1111
data structures which are built at index-time and written to disk. They are then
1212
loaded to memory and accessed in place of the standard paged-bytes implementation.
1313

14-
The main benefit of doc values is lower memory footprint. With the default
14+
The main benefit of doc values is lower memory footprint. With the default
1515
paged-bytes format, if you attempt to load more field data to memory than available
1616
heap space...you'll get an OutOfMemoryException.
1717

18-
By contrast, doc values can stream from disk efficiently and do not require
18+
By contrast, doc values can stream from disk efficiently and do not require
1919
processing at query-time (unlike paged-bytes, which must be generated). This
20-
allows you to work with field data that would normally be too large to fit in
20+
allows you to work with field data that would normally be too large to fit in
2121
memory.
2222

2323
The trade-off is a larger index size and potentially slower field data access.
@@ -35,7 +35,7 @@ tradeoff for truly massive data.
3535
==== Enabling Doc Values
3636

3737
Doc values can be enabled for numeric fields, geopoints and `not_analyzed` string fields.
38-
They do not currently work with `analyzed` string fields. Doc values are
38+
They do not currently work with `analyzed` string fields. Doc values are
3939
enabled in the mapping of a particular field, which means that some fields can
4040
use doc values while the rest use the default paged-bytes.
4141

@@ -56,7 +56,7 @@ PUT /fielddata/filtering/_mapping
5656
}
5757
}
5858
----
59-
<1> Doc values can only be enabled on `not_analyzed` string fields, numerics and
59+
<1> Doc values can only be enabled on `not_analyzed` string fields, numerics and
6060
geopoints
6161
<2> Doc values are enabled by setting the `"fielddata.format"` parameter to
6262
`doc_values`

310_Geolocation.asciidoc

+44-20
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,56 @@
1-
[[geoloc]]
2-
== Geolocation (TODO)
1+
:ref: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/
32

4-
The web is increasingly location aware – users expect to see local results,
5-
or to be able to filter results by their position on a map.
3+
include::310_Geolocation/10_Intro.asciidoc[]
64

7-
This chapter explains how to use geolocation in Elasticsearch, including
8-
optimization tips.
5+
include::310_Geolocation/20_Geopoints.asciidoc[]
96

7+
include::310_Geolocation/30_Filter_by_geopoint.asciidoc[]
108

11-
=== Adding geolocation to your documents
12-
* Mapping the geo-point type
13-
* Indexing documents with geo-points
9+
include::310_Geolocation/32_Bounding_box.asciidoc[]
1410

15-
[[geoloc-filters]]
16-
=== Geolocation-aware search
17-
* geo-distance and geo-distance-range filters
18-
* geo-bounding-box filter
19-
* geo-polygon filter
11+
include::310_Geolocation/34_Geo_distance.asciidoc[]
2012

21-
=== Sorting by distance
22-
.
13+
include::310_Geolocation/36_Caching_geofilters.asciidoc[]
2314

15+
include::310_Geolocation/38_Reducing_memory.asciidoc[]
2416

25-
=== Geo-shapes
26-
.
17+
include::310_Geolocation/40_Geohashes.asciidoc[]
2718

19+
include::310_Geolocation/50_Sorting_by_distance.asciidoc[]
2820

29-
=== Optimizing geo-queries
30-
.
21+
include::310_Geolocation/60_Geo_aggs.asciidoc[]
3122

23+
include::310_Geolocation/62_Geo_distance_agg.asciidoc[]
3224

25+
include::310_Geolocation/64_Geohash_grid_agg.asciidoc[]
26+
27+
include::310_Geolocation/66_Geo_bounds_agg.asciidoc[]
28+
29+
include::310_Geolocation/70_Geoshapes.asciidoc[]
30+
31+
include::310_Geolocation/72_Mapping_geo_shapes.asciidoc[]
32+
33+
include::310_Geolocation/74_Indexing_geo_shapes.asciidoc[]
34+
35+
include::310_Geolocation/76_Querying_geo_shapes.asciidoc[]
36+
37+
include::310_Geolocation/78_Indexed_geo_shapes.asciidoc[]
38+
39+
include::310_Geolocation/80_Caching_geo_shapes.asciidoc[]
40+
41+
42+
////////
43+
44+
45+
46+
geo_shape:
47+
mapping
48+
tree
49+
precision
50+
type of shapes
51+
indexing
52+
indexed shapes
53+
filters
54+
geoshape
55+
56+
////////

310_Geolocation/10_Intro.asciidoc

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
[[geoloc]]
2+
== Geolocation
3+
4+
Gone are the days when we wander around a city with paper maps. Thanks to
5+
smartphones, we now know exactly where we are all of the time, and we expect
6+
websites to use that information. I'm not interested in restaurants in
7+
Greater London -- I want to know about restaurants within 5 minutes walk of my
8+
current location.
9+
10+
But geolocation is only one part of the puzzle. The beauty of Elasticsearch
11+
is that it allows you to combine geolocation with full text search, structured
12+
search, and analytics.
13+
14+
For instance: show me restaurants that mention _vitello tonnato_, are within 5
15+
minutes walk, and are open at 11pm, and rank them by a combination of user
16+
rating, distance and price. Another example: show me a map of holiday rental
17+
properties available in August throughout the city, and calculate the average
18+
price per zone.
19+
20+
Elasticsearch offers two ways of representing geolocations: latitude-longitude
21+
points using the `geo_point` field type, and complex shapes defined in
22+
http://en.wikipedia.org/wiki/GeoJSON[GeoJSON], using the `geo_shape` field
23+
type.
24+
25+
Geo-points allow you to find points within a certain distance of another
26+
point, to calculate distances between two points for sorting or relevance
27+
scoring, or to aggregate into a grid to display on a map. Geo-shapes, on the
28+
other hand, are used purely for filtering. They can be used to decide whether
29+
two shapes overlap or not, or whether one shape completely contains other
30+
shapes.
31+
32+
33+

310_Geolocation/20_Geopoints.asciidoc

+76
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
[[indexing-geopoints]]
2+
=== Indexing geo-points
3+
4+
Geo-points cannot be automatically detected with
5+
<<dynamic-mapping,dynamic mapping>>. Instead, geo-points fields should be
6+
mapped explicitly:
7+
8+
[source,json]
9+
-----------------------
10+
PUT /attractions
11+
{
12+
"mappings": {
13+
"restaurant": {
14+
"properties": {
15+
"name": {
16+
"type": "string"
17+
},
18+
"location": {
19+
"type": "geo_point"
20+
}
21+
}
22+
}
23+
}
24+
}
25+
-----------------------
26+
27+
[[lat-lon-formats]]
28+
==== Lat/Lon formats
29+
30+
With the `location` field defined as a `geo_point`, we can proceed to index
31+
documents containing latitude/longitude pairs, which can be formatted as
32+
strings, arrays, or objects:
33+
34+
[source,json]
35+
-----------------------
36+
PUT /attractions/restaurant/1
37+
{
38+
"name": "Chipotle Mexican Grill",
39+
"location": "40.715, -74.011" <1>
40+
}
41+
42+
PUT /attractions/restaurant/2
43+
{
44+
"name": "Pala Pizza",
45+
"location": { <2>
46+
"lat": 40.722,
47+
"lon": -73.989
48+
}
49+
}
50+
51+
PUT /attractions/restaurant/3
52+
{
53+
"name": "Mini Munchies Pizza",
54+
"location": [ -73.983, 40.719 ] <3>
55+
}
56+
-----------------------
57+
<1> A string representation, with `"lat,lon"`.
58+
<2> An object representation with `lat` and `lon` explicitly named.
59+
<3> An array representation with `[lon,lat]`.
60+
61+
[IMPORTANT]
62+
========================
63+
64+
Everybody gets caught at least once: string geo-points are
65+
`"latitude,longitude"`, while array geo-points are `[longitude,latitude]` --
66+
the opposite order!
67+
68+
Originally, both strings and arrays in Elasticsearch used latitude followed by
69+
longitude. However, it was decided early on to switch the order for arrays in
70+
order to conform with GeoJSON.
71+
72+
The result is a bear trap that captures all unsuspecting users on their
73+
journey to full geo-location nirvana.
74+
75+
========================
76+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
[[filter-by-geopoint]]
2+
=== Filtering by geo-point
3+
4+
Four geo-filters filters can be used to include or exclude documents by
5+
geo-location:
6+
7+
<<geo-bounding-box,`geo_bounding_box`>>::
8+
9+
Find geo-points which fall within the specified rectangle.
10+
11+
<<geo-distance,`geo_distance`>>::
12+
13+
Find geo-points within the specified distance of a central point.
14+
15+
<<geo-distance-range,`geo_distance_range`>>::
16+
17+
Find geo-points within a specified minimum and maximum distance from a
18+
central point.
19+
20+
`geo_polygon`::
21+
22+
Find geo-points which fall within the specified polygon. *This filter is
23+
very expensive*. If you find yourself wanting to use it, you should be
24+
looking at <<geo-shapes,geo-shapes>> instead.
25+
26+
All of these filters work in a similar way: the `lat/lon` values are loaded
27+
into memory for *all documents in the index*, not just the documents which
28+
match the query (see <<fielddata-intro>>). Each filter performs a slightly
29+
different calculation to check whether a point falls into the containing area
30+
or not.
31+
32+
[TIP]
33+
============================
34+
35+
Geo-filters are expensive -- they should be used on as few documents as
36+
possible. First remove as many documents as you can with cheaper filters, like
37+
`term` or `range` filters, and apply the geo filters last.
38+
39+
The <<bool-filter,`bool` filter>> will do this for you automatically. First it
40+
applies any bitset-based filters (see <<filter-caching>>) to exclude as many
41+
documents as it can as cheaply as possible. Then it applies the more
42+
expensive geo or script filters to each remaining document in turn.
43+
44+
============================
+96
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
[[geo-bounding-box]]
2+
=== `geo_bounding_box` filter
3+
4+
This is by far the most performant geo-filter because its calculation is very
5+
simple. You provide it with the `top`, `bottom`, `left`, and `right`
6+
coordinates of a rectangle and all it does is compare the latitude with the
7+
left and right coordinates, and the longitude with the top and bottom
8+
coordinates.
9+
10+
[source,json]
11+
---------------------
12+
GET /attractions/restaurant/_search
13+
{
14+
"query": {
15+
"filtered": {
16+
"filter": {
17+
"geo_bounding_box": {
18+
"location": { <1>
19+
"top": 40.8,
20+
"bottom": 40.7,
21+
"left": -74.0,
22+
"right": -73.0
23+
}
24+
}
25+
}
26+
}
27+
}
28+
}
29+
---------------------
30+
<1> These coordinates can also be specified as `top_left` and `bottom_right`
31+
pairs, or `bottom_left` and `top_right` pairs.
32+
33+
[[optimize-bounding-box]]
34+
==== Optimizing bounding boxes
35+
36+
The `geo_bounding_box` is the one geo-filter which doesn't require all
37+
geo-points to be loaded into memory. Because all it has to do is to check
38+
whether the `lat` and `lon` values fall within the specified ranges, it can
39+
use the inverted index to do a glorified `range` filter.
40+
41+
In order to use this optimization, the `geo_point` field must be mapped to
42+
index the `lat` and `lon` values separately:
43+
44+
[source,json]
45+
-----------------------
46+
PUT /attractions
47+
{
48+
"mappings": {
49+
"restaurant": {
50+
"properties": {
51+
"name": {
52+
"type": "string"
53+
},
54+
"location": {
55+
"type": "geo_point",
56+
"lat_lon": true <1>
57+
}
58+
}
59+
}
60+
}
61+
}
62+
-----------------------
63+
<1> The `location.lat` and `location.lon` fields will be indexed separately.
64+
These fields can be used for searching, but their values cannot be retrieved.
65+
66+
Now, when we run our query, we have to tell Elasticsearch to use the indexed
67+
`lat` and `lon` values:
68+
69+
[source,json]
70+
---------------------
71+
GET /attractions/restaurant/_search
72+
{
73+
"query": {
74+
"filtered": {
75+
"filter": {
76+
"geo_bounding_box": {
77+
"type": "indexed", <1>
78+
"location": {
79+
"top": 40.8,
80+
"bottom": 40.7,
81+
"left": -74.0,
82+
"right": -73.0
83+
}
84+
}
85+
}
86+
}
87+
}
88+
}
89+
---------------------
90+
<1> Setting the `type` parameter to `indexed` (instead of the default
91+
`memory`) tells Elasticsearch to use the inverted index for this filter.
92+
93+
IMPORTANT: While a `geo_point` field can contain multiple geo-points, the
94+
`lat_lon` optimization can only be used on fields which contain a single
95+
geo-point.
96+

0 commit comments

Comments
 (0)