Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

Commit d12d419

Browse files
Reworked intro and distributed-cluster
1 parent 10ae14f commit d12d419

24 files changed

+532
-984
lines changed

010_Intro/00_Intro.asciidoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Elasticsearch is a real-time distributed search and analytics engine which
22
allows you to explore your data at a speed and at a scale never before
33
possible.
44

5-
It is used for full text search, structured search, analytics and
5+
It is used for full text search, structured search, analytics, and
66
all three in combination. It can run on your laptop, or scale out to hundreds
77
of servers and petabytes of data.
88

010_Intro/05_What_is_it.asciidoc

+20-20
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,37 @@
11
=== What is Elasticsearch?
22

3-
Elasticsearch is a search engine built on top of Apache Lucene, a full-text
4-
search library. Lucene is arguably the most advanced, performant and fully-featured
5-
search engine in existence today -- both open source and proprietary.
3+
Elasticsearch is a search engine built on top of
4+
https://lucene.apache.org/core/[Apache Lucene(TM)] , a full-text search engine
5+
library. Lucene is arguably the most advanced, performant and fully-featured
6+
search engine library in existence today -- both open source and proprietary.
67

7-
But Lucene is just a library. To leverage it's power you need to work in Java
8-
and integrate it directly with your application. Worse, you will likely
8+
But Lucene is just a library. To leverage its power you need to work in Java
9+
and to integrate Lucene directly with your application. Worse, you will likely
910
require a degree in Information Retrieval to understand how it works. Lucene
1011
is *very* complex.
1112

12-
Elasticsearch aims to make full text search easy by hiding the complexities of
13-
Lucene behind a simple, coherent API. Elasticsearch uses Lucene internally
14-
for all of its indexing and search.
13+
Elasticsearch uses Lucene internally for all of its indexing and search, but
14+
it aims to make full text search easy by hiding the complexities of Lucene
15+
behind a simple, coherent API.
1516

16-
However, it is much more than just Lucene and much more than ``just'' full
17-
text search.
17+
However, Elasticsearch is much more than just Lucene and much more than
18+
``just'' full text search. It is also:
1819

19-
Elasticsearch is also:
20-
21-
* a distributed document store where every field is indexed and
20+
* a distributed document store where *every field* is indexed and
2221
searchable
2322
* a distributed search engine with real-time analytics
2423
* capable of scaling to hundreds of servers and petabytes of structured
2524
and unstructured data.
2625
27-
And it packages up all of this functionality into a standalone service
28-
that your application can talk to via a simple RESTful API. Use
29-
your favorite programming language, Elasticsearch doesn't care.
26+
And it packages up all of this functionality into a standalone server
27+
that your application can talk to via a simple RESTful API, using
28+
a web client from your favorite programming language, or even
29+
from the command line.
3030

31-
It is easy to get started with Elasticsearch. It ships with
32-
sensible defaults and hides complicated search theory from beginners.
33-
It _just works_, right out of the box. With minimal understanding,
34-
you can soon become productive.
31+
It is easy to get started with Elasticsearch. It ships with sensible defaults
32+
and hides complicated search theory away from beginners. It _just works_,
33+
right out of the box. With minimal understanding, you can soon become
34+
productive.
3535

3636
As your knowledge grows, you can leverage more of Elasticsearch's
3737
advanced features. The entire engine is configurable and very flexible.

010_Intro/10_Installing_ES.asciidoc

+15-14
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,9 @@ with:
2323

2424
[source,js]
2525
--------------------------------------------------
26-
./bin/elasticsearch -f
26+
./bin/elasticsearch <1>
2727
--------------------------------------------------
28-
28+
<1> Add `-d` if you want to run it as a daemon.
2929

3030
Test it out by opening another terminal window and running:
3131

@@ -40,24 +40,25 @@ You should see a response like this:
4040
[source,js]
4141
--------------------------------------------------
4242
{
43-
"tagline" : "You Know, for Search",
44-
"ok" : true,
45-
"status" : 200,
46-
"name" : "Contrary",
47-
"version" : {
48-
"number" : "0.20.2",
49-
"snapshot_build" : false
50-
}
43+
"status": 200,
44+
"name": "Shrunken Bones",
45+
"version": {
46+
"number": "1.0.0",
47+
"lucene_version": "4.6"
48+
},
49+
"tagline": "You Know, for Search"
5150
}
5251
--------------------------------------------------
5352

54-
5553
This means that your Elasticsearch _cluster_ is up and running, and we can
5654
start experimenting with it.
5755

5856
.Clusters and nodes
5957
****
60-
A _node_ is a running instance of Elasticsearch. A _cluster_ is a group
61-
of nodes that are working together to share data and to provide failover and
62-
scale, although a single node can form a cluster by itself.
58+
59+
A _node_ is a running instance of Elasticsearch. A _cluster_ is a group of
60+
nodes with the same `cluster.name` that are working together to share data,
61+
and to provide failover and scale, although a single node can form a cluster
62+
all by itself.
63+
6364
****

010_Intro/15_API.asciidoc

+19-15
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,11 @@ Transport client::
1919
forwards requests to a node in the cluster.
2020

2121
Both Java clients talk to the cluster over *port 9300*, using the native
22-
Elasticsearch protocol. The nodes in the cluster also communicate
22+
Elasticsearch _transport_ protocol. The nodes in the cluster also communicate
2323
with each other over port 9300. If this port is not open, then your nodes will
2424
not be able to form a cluster.
2525

26-
[NOTE]
26+
[TIP]
2727
====
2828
The Java client must be from the same version of Elasticsearch as the nodes,
2929
otherwise they may not be able to understand each other.
@@ -32,7 +32,7 @@ otherwise they may not be able to understand each other.
3232
==== RESTful API with JSON over HTTP
3333

3434
All other languages can communicate with Elasticsearch over *port 9200* using
35-
a RESTful API, accessible with your favorite HTTP library. In fact, as you have
35+
a RESTful API, accessible with your favorite web client. In fact, as you have
3636
seen above, you can even talk to Elasticsearch from the command line, using the
3737
`curl` command.
3838

@@ -51,10 +51,13 @@ use:
5151

5252
[source,js]
5353
--------------------------------------------------
54-
GET /_count?pretty
54+
curl -XGET 'http://localhost:9200/_count?pretty' -d '
5555
{
56-
"match_all": {}
56+
"query": {
57+
"match_all": {}
58+
}
5759
}
60+
'
5861
--------------------------------------------------
5962

6063
All responses consist of:
@@ -85,18 +88,20 @@ switch:
8588

8689
[source,js]
8790
--------------------------------------------------
88-
curl -I -XGET 'localhost:9200/'
91+
curl -i -XGET 'localhost:9200/'
8992
--------------------------------------------------
9093

91-
9294
We will use this `curl` format in all of our examples because it is easy to
9395
read and easy to translate into a request using the HTTP library of your
9496
choice.
9597

96-
.Curl Shorthand
98+
.`curl` shorthand
9799
****
98-
For the rest of the book, you'll notice curl examples in the shorthand format
99-
used above. The curl command is shortened from:
100+
101+
For the rest of the book, we will show `curl` examples using a shorthand
102+
format that leaves out all of the bits that are the same in every request,
103+
like the hostname and port, and the `curl` command itself. Instead of showing
104+
a full request like:
100105
101106
[source,js]
102107
--------------------------------------------------
@@ -106,17 +111,16 @@ curl -XGET 'localhost:9200/_count?pretty' -d '
106111
}'
107112
--------------------------------------------------
108113
109-
to simply:
114+
we will show it in this shorthand format:
110115
111116
[source,js]
112117
--------------------------------------------------
113-
GET /_count?pretty
118+
GET /_count
114119
{
115120
"match_all": {}
116121
}
117122
--------------------------------------------------
118123
119-
This is done to space and repetitive typing. We will show you the HTTP method
120-
(GET, PUT, POST, etc) and the URL path which follows `localhost:9200`. The
121-
rest of the lines following the curl command are the body of the request.
124+
TODO: Link to the Sense plugin?
125+
122126
****

010_Intro/20_Document.asciidoc

+21-24
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,27 @@
22

33
Objects in an application are seldom just a simple list of keys and values.
44
More often than not they are complex data structures which may contain other
5-
objects, or arrays of values.
5+
dates, geo-locations, objects, or arrays of values.
66

7-
When using a relational database to store these objects, you flatten
8-
the object to fit the table schema (usually one field per column). When
9-
retrieving the object from the database, you have to reconstruct it
10-
from the flat representation.
7+
Sooner or later you're going to want to store these objects in a database.
8+
Trying to do this with the rows and columns of a relational database is the
9+
equivalent of trying to squeeze your rich expressive objects into a very big
10+
spreadsheet: you have to flatten the object to fit the table schema -- usually
11+
one field per column -- and then have to reconstruct it every time you
12+
retrieve it.
1113

12-
Elasticsearch is _document oriented_, meaning that it stores and
13-
indexes entire _documents_. In Elasticsearch, you index, search,
14-
sort and filter documents...not rows of data. This is a fundamentally different
15-
way of thinking about data and is one of the reasons Elasticsearch can
16-
perform complex full text search.
14+
Elasticsearch is _document oriented_, meaning that it stores and indexes
15+
entire objects or _documents_. In Elasticsearch, you index, search, sort and
16+
filter documents... not rows of columnar data. This is a fundamentally
17+
different way of thinking about data and is one of the reasons Elasticsearch
18+
can perform complex full text search.
1719

20+
==== JSON
1821

19-
[NOTE]
20-
====
21-
Elasticsearch uses JSON as the serialization format for documents.
22-
23-
JSON serialization is supported by most programming languages, and has become
24-
the standard format used by the NoSQL movement. It is simple, concise and easy
25-
to read.
22+
Elasticsearch uses _JSON_ (or Javascript Object Notation ) as the
23+
serialization format for documents. JSON serialization is supported by most
24+
programming languages, and has become the standard format used by the NoSQL
25+
movement. It is simple, concise and easy to read.
2626

2727
Consider this JSON document which represents a user object:
2828

@@ -41,10 +41,7 @@ Consider this JSON document which represents a user object:
4141
}
4242
--------------------------------------------------
4343

44-
45-
Although the original user object was complex, the
46-
structure of the object has been retained in the JSON version.
47-
Converting an object to JSON for indexing in Elasticsearch
48-
is much simpler than the equivalent process for a flat table structure.
49-
====
50-
44+
Although the original `user` object was complex, the structure and meaning of
45+
the object has been retained in the JSON version. Converting an object to JSON
46+
for indexing in Elasticsearch is much simpler than the equivalent process for
47+
a flat table structure.

010_Intro/25_CRUD.asciidoc

+25-20
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,19 @@
11
=== Indexing documents
22

3-
Before we can _index_ (store) our user document in Elasticsearch, we need
4-
to decide what the document represents, and where to store it.
3+
Before we can _index_ (store and make searchable) our user document in
4+
Elasticsearch, we need to decide what the document represents, and where to
5+
store it.
56

67
In Elasticsearch, a document belongs to a _type_, and those types live inside
78
an _index_. You can draw some (rough) parallels to a traditional relational database:
89

9-
- RDBM => Databases => Tables => Columns/Rows
10-
- Elasticsearch => Indices => Types => Documents with Fields
1110

12-
An Elasticsearch cluster can contain multiple Indices (databases), which in
13-
turn contain multiple Types (tables). These types hold multiple Documents (rows),
14-
and each document has Fields (columns).
11+
Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns
12+
Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields
13+
14+
An Elasticsearch cluster can contain multiple _indices_ (databases), which in
15+
turn contain multiple _types_ (tables). These types hold multiple _documents_
16+
(rows), and each document has multiple _fields_ (columns).
1517

1618
==== An example
1719
We are going to store a document in the `blogs` index, as type `user`, and we
@@ -21,8 +23,9 @@ than in the document itself:
2123

2224
[source,js]
2325
--------------------------------------------------
24-
PUT /blogs/user/johnsmith?pretty
25-
{
26+
<1> <2> <3>
27+
PUT /blogs/user/johnsmith
28+
{ <4>
2629
"email": "john@smith.com",
2730
"name": {
2831
"first": "John",
@@ -34,19 +37,22 @@ PUT /blogs/user/johnsmith?pretty
3437
"interests": ["dolphins", "whales"]
3538
}
3639
--------------------------------------------------
37-
40+
<1> Index: `blogs`
41+
<2> Type: `user`
42+
<3> ID: `johnsmith`
43+
<4> Document body
3844

3945
And we receive the following response, which confirms that our document
4046
has been indexed correctly:
4147

4248
[source,js]
4349
--------------------------------------------------
4450
{
45-
"ok" : true,
46-
"_index" : "blogs",
47-
"_type" : "user",
48-
"_id" : "johnsmith",
49-
"_version" : 1
51+
"_index": "blogs",
52+
"_type": "user",
53+
"_id": "johnsmith",
54+
"_version": 1,
55+
"created": true
5056
}
5157
--------------------------------------------------
5258

@@ -55,8 +61,8 @@ Congratulations! You just indexed your first document! How easy was that?
5561

5662
=== Real-time GET
5763

58-
Elasticsearch has _real-time GET_. In other words, as soon as the document
59-
has been indexed, it can be retrieved from any node in the cluster.
64+
Elasticsearch has _real-time GET_. In other words, as soon as a document
65+
has been indexed it can be retrieved from any node in the cluster.
6066

6167
Not only that, but changes to documents are _persistent_: if the whole cluster
6268
were to suffer a power failure immediately after indexing a document, the
@@ -67,10 +73,9 @@ that we specified when indexing it:
6773

6874
[source,js]
6975
--------------------------------------------------
70-
GET /blogs/user/johnsmith?pretty
76+
GET /blogs/user/johnsmith
7177
--------------------------------------------------
7278

73-
7479
The response contains the exact same JSON document that we indexed, as the
7580
`_source` field, plus some extra metadata:
7681

@@ -81,7 +86,7 @@ The response contains the exact same JSON document that we indexed, as the
8186
"_type" : "user",
8287
"_id" : "johnsmith",
8388
"_version" : 1,
84-
"exists" : true,
89+
"found" : true,
8590
"_source" : {
8691
"email": "john@smith.com",
8792
"name": {

0 commit comments

Comments
 (0)