The world is swimming in data. For years we have been simply overwhelmed by the quantity of data at our fingertips. Technology has focused on how to store and structure warehouses full of data. This is superb…until you actually need to use that data.
Elasticsearch is a new technology that focuses on how to find data, because data is not useful just sitting on a hard drive.
You should read this book if you have data, need to find it fast, and need to scale past the capabilities of a single machine.
Maybe you’ve been given a greenfield project and have the flexibility to design the entire system ground up. This new project is expected to grow by terrabytes in the coming years and you’re promising aggressive, sub-second response times. This project requires all the features that modern users expect: full-text search, autocompletion, suggestions, geolocation support. And these features need to scale with your data and maintain fast response times.
Equally likely, you may be refactoring and replacing older systems. You need to integrate with legacy architecture and existing user interfaces. You need to replicate the functionality of the old system while delivering better performance (which is probably the reason for refactoring in the first place). And crucially, this new architecture needs to scale over time so that you don’t refactor again in a few years.
Or finally, maybe you are in dev-ops. While all the other departments are trying to wrangle data into a usable product, you’re trying to keep all of their servers from bursting into flames. You have dozens of clusters with hundreds of machines, all spewing forth logs and warnings, many of which go unnoticed until that 3am wakeup call. You are looking for a way to take these logs and derive meaning from them before it becomes a catastrophe.
Elasticsearch is the answer to all of these scenarios. At its heart, Elasticsearch is a search engine. But if you think about it, aren’t most problems just about searching for the right piece of data that your application needs?
We wrote this book because Elasticsearch needs a narrative documentation. The existing reference documentation is excellent…if you know what you are doing. It assumes that you are intimately familiar with information retrieval concepts, distributed systems, the query DSL and a host of other topics.
This book makes no such assumptions. It has been written so that a complete beginner — to both search and distributed systems — can pick it up and start building a prototype within a few chapters. This book is a soup-to-nuts narrative. We introduce theoretical concepts and build with concrete examples, moving from beginner topics to advanced functionality.
The existing reference documentation explains how to use features. We want this book to explain why and when to use various features.
This book is written around Elasticsearch 1.0. While most of the syntax and concepts are backwards-compatible with older versions, you should Elasticsearch 1.0 in conjunction with this book for the best experience.
This book is organized into three main sections:
This section introduces you to the fundamental concepts in Elasticsearch. After reading this section of the book, you should have enough knowledge to build a prototype, navigate the reference documentation and start to think about more advanced functionality.
-
Chapter 1 ("you know, for search…") is a crash-course introduction to Elasticsearch, introducing the wide array of functionality and a taste of the syntax
-
Chapter 2 ("life inside a cluster") introduces you to the distributed nature of Elasticsearch
-
Chapter 3 ("data in, data out") describes how to get your data into and out of Elasticsearch, while Chapter 4 ("distributed document store") explains how all of this works in a distributed environment
-
Chapter 5 ("searching – the basic tools") explains important search terminology and introduces some simple search syntax
-
Chapter 6 ("mapping and analysis") and Chapter 7 ("full body search") introduce the more robust Query DSl and how to use it to perform complex searches
-
Chapter 8 ("sorting and relevance") dives deeper into search theory and gives you the tools to understand why documents are returned as a search result
-
Chapter 8 ("distributed search execution") explains how search operates in a distributed environment, and why it is different from other distributed operations
-
Finally, Chapter 10 ("index management") closes with some useful administrative tools to manage your indices
This section takes the knowledge gained in Part 1 and begins to utilize it for advanced functionality. These chapters deal with many of the use-cases you see in the real world: searching for structured data, dealing with multiple fields, handling synonyms, etc.
These chapters focus less on the syntax of a query, and more on the problem that you are trying to solve. While reading these chapters, you should begin to see parallels to various facets of your data. Not all sections will apply to your problem, but the concepts discussed will give you an idea how to think about finding data.
-
Chapter 11 ("structured search") introduces structured search and how it differs from full-text searching
-
Chapter 12 ("full text search") immerses you in the intricacies of full-text search. This section will take your search results from "good" to "great", by giving you a deeper understanding of what is happening inside of Elasticsearch
-
Chapter 13 ("multi-field search") is entirely about searching multiple fields at the same time, since this introduces many additional complexities in scoring and relevancy
-
Chapter 14 discusses making searches more exact with phrase matching, while Chapter 15 shows how to make your searches less precise with partial matches
-
Chapter 16 and 17 deal with the intricacies of written language and how to handle internationalization
-
Chapter 18 introduces fuzzy searching and tolerating typos
-
Chapter 19 exposes the nitty-gritty details of scoring, boosting and relevance
-
Chapter 20 talks about some of the ways to deal with text that isn’t exactly prose: file paths, encodings, binary data
Finally, Part 3 deals with topics that aren’t strictly search, but utilize the same underlying infrastructure. In these sections, you’ll learn about real-time analytics, geolocation search and reversed-search with percolation.
You’ll also learn some common patterns of structuring data, which are useful when planning your system. Finally, we wrap up the book with some tips for moving to production and how to monitor your cluster’s health.
Because this book tries to focus on problem solving in Elasticsearch and less about syntax, we sometimes reference the existing documentation for a complete list of parameters. The reference documentation can be found here:
<link>
The following typographical conventions are used in this book:
- Italic
-
Indicates new terms, URLs, email addresses, filenames, and file extensions.
- Constant width
-
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
-
Shows commands or other text that should be typed literally by the user.
- Constant width italic
-
Shows text that should be replaced with user-supplied values or by values determined by context.
Tip
|
This icon signifies a tip, suggestion, or general note. |
Warning
|
This icon indicates a warning or caution. |
Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/oreillymedia/title_title.
This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Book Title by Some Author (O’Reilly). Copyright 2012 Some Copyright Holder, 978-0-596-xxxx-x.”
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.
Note
|
Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business. |
Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.
Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.
Please address comments and questions concerning this book to the publisher:
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://www.oreilly.com/catalog/<catalog page>.
To comment or ask technical questions about this book, send email to bookquestions@oreilly.com.
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia