Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

Latest commit

 

History

History
266 lines (208 loc) · 15.5 KB

Preface.asciidoc

File metadata and controls

266 lines (208 loc) · 15.5 KB

Preface

The world is swimming in data. For years we have been simply overwhelmed by the quantity of data at our fingertips. Technology has focused on how to store and structure warehouses full of data. This is superb…​until you actually need to use that data.

Elasticsearch is a new technology that focuses on how to find data, because data is not useful just sitting on a hard drive.

Who Should Read This Book

You should read this book if you have data, need to find it fast, and need to scale past the capabilities of a single machine.

Maybe you’ve been given a greenfield project and have the flexibility to design the entire system ground up. This new project is expected to grow by terrabytes in the coming years and you’re promising aggressive, sub-second response times. This project requires all the features that modern users expect: full-text search, autocompletion, suggestions, geolocation support. And these features need to scale with your data and maintain fast response times.

Equally likely, you may be refactoring and replacing older systems. You need to integrate with legacy architecture and existing user interfaces. You need to replicate the functionality of the old system while delivering better performance (which is probably the reason for refactoring in the first place). And crucially, this new architecture needs to scale over time so that you don’t refactor again in a few years.

Or finally, maybe you are in dev-ops. While all the other departments are trying to wrangle data into a usable product, you’re trying to keep all of their servers from bursting into flames. You have dozens of clusters with hundreds of machines, all spewing forth logs and warnings, many of which go unnoticed until that 3am wakeup call. You are looking for a way to take these logs and derive meaning from them before it becomes a catastrophe.

Elasticsearch is the answer to all of these scenarios. At its heart, Elasticsearch is a search engine. But if you think about it, aren’t most problems just about searching for the right piece of data that your application needs?

Why I Wrote This Book

We wrote this book because Elasticsearch needs a narrative documentation. The existing reference documentation is excellent…​if you know what you are doing. It assumes that you are intimately familiar with information retrieval concepts, distributed systems, the query DSL and a host of other topics.

This book makes no such assumptions. It has been written so that a complete beginner — to both search and distributed systems — can pick it up and start building a prototype within a few chapters. This book is a soup-to-nuts narrative. We introduce theoretical concepts and build with concrete examples, moving from beginner topics to advanced functionality.

The existing reference documentation explains how to use features. We want this book to explain why and when to use various features.

A Word on [something] Today

Ehm?? Need this section?

Navigating This Book

This book is written around Elasticsearch 1.0. While most of the syntax and concepts are backwards-compatible with older versions, you should Elasticsearch 1.0 in conjunction with this book for the best experience.

This book is organized into three main sections:

Part 1: "Getting Started"

This section introduces you to the fundamental concepts in Elasticsearch. After reading this section of the book, you should have enough knowledge to build a prototype, navigate the reference documentation and start to think about more advanced functionality.

  • Chapter 1 ("you know, for search…") is a crash-course introduction to Elasticsearch, introducing the wide array of functionality and a taste of the syntax

  • Chapter 2 ("life inside a cluster") introduces you to the distributed nature of Elasticsearch

  • Chapter 3 ("data in, data out") describes how to get your data into and out of Elasticsearch, while Chapter 4 ("distributed document store") explains how all of this works in a distributed environment

  • Chapter 5 ("searching – the basic tools") explains important search terminology and introduces some simple search syntax

  • Chapter 6 ("mapping and analysis") and Chapter 7 ("full body search") introduce the more robust Query DSl and how to use it to perform complex searches

  • Chapter 8 ("sorting and relevance") dives deeper into search theory and gives you the tools to understand why documents are returned as a search result

  • Chapter 8 ("distributed search execution") explains how search operates in a distributed environment, and why it is different from other distributed operations

  • Finally, Chapter 10 ("index management") closes with some useful administrative tools to manage your indices

Part 2: "Search in Depth"

This section takes the knowledge gained in Part 1 and begins to utilize it for advanced functionality. These chapters deal with many of the use-cases you see in the real world: searching for structured data, dealing with multiple fields, handling synonyms, etc.

These chapters focus less on the syntax of a query, and more on the problem that you are trying to solve. While reading these chapters, you should begin to see parallels to various facets of your data. Not all sections will apply to your problem, but the concepts discussed will give you an idea how to think about finding data.

  • Chapter 11 ("structured search") introduces structured search and how it differs from full-text searching

  • Chapter 12 ("full text search") immerses you in the intricacies of full-text search. This section will take your search results from "good" to "great", by giving you a deeper understanding of what is happening inside of Elasticsearch

  • Chapter 13 ("multi-field search") is entirely about searching multiple fields at the same time, since this introduces many additional complexities in scoring and relevancy

  • Chapter 14 discusses making searches more exact with phrase matching, while Chapter 15 shows how to make your searches less precise with partial matches

  • Chapter 16 and 17 deal with the intricacies of written language and how to handle internationalization

  • Chapter 18 introduces fuzzy searching and tolerating typos

  • Chapter 19 exposes the nitty-gritty details of scoring, boosting and relevance

  • Chapter 20 talks about some of the ways to deal with text that isn’t exactly prose: file paths, encodings, binary data

Part 3: "<something>"

Finally, Part 3 deals with topics that aren’t strictly search, but utilize the same underlying infrastructure. In these sections, you’ll learn about real-time analytics, geolocation search and reversed-search with percolation.

You’ll also learn some common patterns of structuring data, which are useful when planning your system. Finally, we wrap up the book with some tips for moving to production and how to monitor your cluster’s health.

Online Resources

Because this book tries to focus on problem solving in Elasticsearch and less about syntax, we sometimes reference the existing documentation for a complete list of parameters. The reference documentation can be found here:

<link>

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This icon signifies a tip, suggestion, or general note.

Warning

This icon indicates a warning or caution.

Using Code Examples

PROD: Please reach out to author to find out if they will be uploading code examples to oreilly.com or their own site (e.g., GitHub). If there is no code download, delete this whole section. If there is, when you email digidist with the link, let them know what you filled in for title_title (should be as close to book title as possible, i.e., learning_python_2e). This info will determine where digidist loads the files.

Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/oreillymedia/title_title.

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Book Title by Some Author (O’Reilly). Copyright 2012 Some Copyright Holder, 978-0-596-xxxx-x.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://www.oreilly.com/catalog/<catalog page>.

Don't forget to update the link above.

To comment or ask technical questions about this book, send email to bookquestions@oreilly.com.

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Acknowledgments

Fill in...