|
| 1 | +=== Tutorial |
| 2 | + |
| 3 | +Let's walk through a simple tutorial, which covers basic concepts such as |
| 4 | +_indexing_, _search_ and _aggregations_. The goal of this tutorial is to build |
| 5 | +an understanding of what is possible in Elasticsearch, and how easy it is |
| 6 | +to get started. |
| 7 | + |
| 8 | +We'll introduce some terminology and basic concepts, but it is OK if you don't |
| 9 | +understand everything that is going on. We'll cover all the concepts introduced |
| 10 | +here in _much_ greater depth throughout the rest of the book. |
| 11 | + |
| 12 | +So, sit back and enjoy a whirlwind tour of what Elasticsearch is capable of. |
| 13 | + |
| 14 | +==== Let's build an employee directory. |
| 15 | + |
| 16 | +We happen to work for **Megacorp**, and as part of HR's new "We love our |
| 17 | +drones!" initiative, we have been tasked with creating an employee directory. |
| 18 | +The directory is supposed to foster empathy and |
| 19 | +synergistic-dynamic-collaboration (or something), so it has a few different |
| 20 | +business requirements: |
| 21 | + |
| 22 | +- Data can contain enumerations, multi-value tags, numbers and full-text |
| 23 | +- Retrieve any employee's full details |
| 24 | +- Allow structured search, such as finding employees over the age of 30 |
| 25 | +- Allow full-text search on "about" field |
| 26 | +- Enable more complex phrase searches, and highlight snippets from the |
| 27 | +matching documents |
| 28 | +- Enable management to build analytic dashboards over the data |
| 29 | + |
| 30 | +==== Indexing employee documents |
| 31 | + |
| 32 | +The first order of business is storing data about our employees. This will take |
| 33 | +the form of a "employee document", where a single document represents a single |
| 34 | +employee. The act of storing data in Elasticsearch is called _indexing_, but |
| 35 | +before we can index a document we need to decide _where_ to store it. |
| 36 | + |
| 37 | +In Elasticsearch, a document belongs to a _type_, and those types live inside |
| 38 | +an _index_. You can draw some (rough) parallels to a traditional relational database: |
| 39 | + |
| 40 | + Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns |
| 41 | + Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields |
| 42 | + |
| 43 | +An Elasticsearch cluster can contain multiple _indices_ (databases), which in |
| 44 | +turn contain multiple _types_ (tables). These types hold multiple _documents_ |
| 45 | +(rows), and each document has multiple _fields_ (columns). |
| 46 | + |
| 47 | +.Index vs Index vs Index |
| 48 | +************************************************** |
| 49 | +
|
| 50 | +You may already have noticed that the word ``index'' is overloaded with |
| 51 | +several different meanings in the context of Elasticsearch. A little |
| 52 | +clarification is necessary: |
| 53 | +
|
| 54 | +Index (noun):: |
| 55 | +
|
| 56 | +As explained above, an _index_ is like a _database_ in a traditional |
| 57 | +relational database. It is the place to store related documents. The plural of |
| 58 | +_index_ is _indices_ or _indexes_. |
| 59 | +
|
| 60 | +Index (verb):: |
| 61 | +
|
| 62 | +_To index a document_ is to store a document in an _index (noun)_ so that it can |
| 63 | +be retrieved and queried. It is much like the `INSERT` keyword in SQL except |
| 64 | +that, if the document already exists, then the new document would replace the old. |
| 65 | +
|
| 66 | +Inverted index:: |
| 67 | +
|
| 68 | +Relational databases add an _index_, such as a B-Tree index, to specific |
| 69 | +columns in order to improve the speed of data retrieval. Elasticsearch and |
| 70 | +Lucene use a structure called an _inverted index_ for exactly the same |
| 71 | +purpose. |
| 72 | ++ |
| 73 | +By default, every field in a document is _indexed_ -- has an inverted index -- |
| 74 | +and thus is searchable. A field without an inverted index is not searchable. |
| 75 | +See <<inverted-index>> for an explanation of how this structure works. |
| 76 | +
|
| 77 | +************************************************** |
| 78 | + |
| 79 | +So for our employee directory, we are going to do the following: |
| 80 | + |
| 81 | +- Index an `employee` _document_, which contains details about a single employee |
| 82 | +- That document will go into an `employees` _type_ |
| 83 | +- That type will live in an `megacorp` _index_ |
| 84 | +- That index will reside within our Elasticsearch cluster |
| 85 | + |
| 86 | +In practice, this is actually very easy (even though it looks like a lot of |
| 87 | +steps). We can perform all of those actions in a single command: |
| 88 | + |
| 89 | +[source,js] |
| 90 | +-------------------------------------------------- |
| 91 | +curl -XPUT 'localhost:9200/megacorp/employees/1' -d ' |
| 92 | +{ |
| 93 | + "first_name" : "John", |
| 94 | + "last_name" : "Smith", |
| 95 | + "age" : 25, |
| 96 | + "about" : "I love to go rock climbing", |
| 97 | + "interests": ["sports", "music"] |
| 98 | +}' |
| 99 | +-------------------------------------------------- |
| 100 | + |
| 101 | +Notice that the URI (`localhost:9200/megacorp/employees/1`) contains three pieces of information: |
| 102 | + |
| 103 | +- **megacorp** : the index name |
| 104 | +- **employees** : the type name |
| 105 | +- **1** : this is the ID of this particular employee |
| 106 | + |
| 107 | +The request body -- the JSON document -- contains all the information |
| 108 | +about this employee. His name is "John Smith", he's 25 and enjoys |
| 109 | +rock climbing. |
| 110 | + |
| 111 | +Simple! Notice that there was no need to perform any administrative |
| 112 | +tasks first...just index a document. Elasticsearch will build an index |
| 113 | +for you in the background with default settings, so you can focus on |
| 114 | +other tasks for now. |
| 115 | + |
| 116 | +Before moving on, let's add a few more employees to the directory: |
| 117 | + |
| 118 | +[source,js] |
| 119 | +-------------------------------------------------- |
| 120 | +curl -XPUT 'localhost:9200/megacorp/employees/2' -d ' |
| 121 | +{ |
| 122 | + "first_name" : "Jane", |
| 123 | + "last_name" : "Smith", |
| 124 | + "age" : 32, |
| 125 | + "about" : "I like to collect rock albums", |
| 126 | + "interests": ["music"] |
| 127 | +}' |
| 128 | +
|
| 129 | +curl -XPUT 'localhost:9200/megacorp/employees/3' -d ' |
| 130 | +{ |
| 131 | + "first_name" : "Douglas", |
| 132 | + "last_name" : "Fir", |
| 133 | + "age" : 35, |
| 134 | + "about":"I like to build cabinets", |
| 135 | + "interests": ["forestry"] |
| 136 | +}' |
| 137 | +-------------------------------------------------- |
| 138 | + |
| 139 | + |
| 140 | + |
0 commit comments