Skip to content

Latest commit

 

History

History
 
 

elasticsearch-persistence

Elasticsearch::Persistence

Persistence layer for Ruby domain objects in Elasticsearch, using the Repository and ActiveRecord patterns.

The library is compatible with Ruby 1.9.3 (or higher) and Elasticsearch 1.0 (or higher).

Installation

Install the package from Rubygems:

gem install elasticsearch-persistence

To use an unreleased version, either add it to your Gemfile for Bundler:

gem 'elasticsearch-persistence', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'

or install it from a source code checkout:

git clone https://github.com/elasticsearch/elasticsearch-rails.git
cd elasticsearch-rails/elasticsearch-persistence
bundle install
rake install

Usage

The Repository Pattern

The Elasticsearch::Persistence::Repository module provides an implementation of the repository pattern and allows to save, delete, find and search objects stored in Elasticsearch, as well as configure mappings and settings for the index.

Let's have a simple plain old Ruby object (PORO):

class Note
  attr_reader :attributes

  def initialize(attributes={})
    @attributes = attributes
  end

  def to_hash
    @attributes
  end
end

Let's create a default, "dumb" repository, as a first step:

require 'elasticsearch/persistence'
repository = Elasticsearch::Persistence::Repository.new

We can save a Note instance into the repository...

note = Note.new id: 1, text: 'Test'

repository.save(note)
# PUT http://localhost:9200/repository/note/1 [status:201, request:0.210s, query:n/a]
# > {"id":1,"text":"Test"}
# < {"_index":"repository","_type":"note","_id":"1","_version":1,"created":true}

...find it...

n = repository.find(1)
# GET http://localhost:9200/repository/_all/1 [status:200, request:0.003s, query:n/a]
# < {"_index":"repository","_type":"note","_id":"1","_version":2,"found":true, "_source" : {"id":1,"text":"Test"}}
=> <Note:0x007fcbfc0c4980 @attributes={"id"=>1, "text"=>"Test"}>

...search for it...

repository.search(query: { match: { text: 'test' } }).first
# GET http://localhost:9200/repository/_search [status:200, request:0.005s, query:0.002s]
# > {"query":{"match":{"text":"test"}}}
# < {"took":2, ... "hits":{"total":1, ... "hits":[{ ... "_source" : {"id":1,"text":"Test"}}]}}
=> <Note:0x007fcbfc1c7b70 @attributes={"id"=>1, "text"=>"Test"}>

...or delete it:

repository.delete(note)
# DELETE http://localhost:9200/repository/note/1 [status:200, request:0.014s, query:n/a]
# < {"found":true,"_index":"repository","_type":"note","_id":"1","_version":3}
=> {"found"=>true, "_index"=>"repository", "_type"=>"note", "_id"=>"1", "_version"=>2}

The repository module provides a number of features and facilities to configure and customize the behaviour:

  • Configuring the Elasticsearch client being used
  • Setting the index name, document type, and object class for deserialization
  • Composing mappings and settings for the index
  • Creating, deleting or refreshing the index
  • Finding or searching for documents
  • Providing access both to domain objects and hits for search results
  • Providing access to the Elasticsearch response for search results (aggregations, total, ...)
  • Defining the methods for serialization and deserialization

You can use the default repository class, or include the module in your own. Let's review it in detail.

The Default Class

For simple cases, you can use the default, bundled repository class, and configure/customize it:

repository = Elasticsearch::Persistence::Repository.new do
  # Configure the Elasticsearch client
  client Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'], log: true

  # Set a custom index name
  index :my_notes

  # Set a custom document type
  type  :my_note

  # Specify the class to inicialize when deserializing documents
  klass Note

  # Configure the settings and mappings for the Elasticsearch index
  settings number_of_shards: 1 do
    mapping do
      indexes :text, analyzer: 'snowball'
    end
  end

  # Customize the serialization logic
  def serialize(document)
    super.merge(my_special_key: 'my_special_stuff')
  end

  # Customize the de-serialization logic
  def deserialize(document)
    puts "# ***** CUSTOM DESERIALIZE LOGIC KICKING IN... *****"
    super
  end
end

The custom Elasticsearch client will be used now, with a custom index and type names, as well as the custom serialization and de-serialization logic.

We can create the index with the desired settings and mappings:

repository.create_index! force: true
# PUT http://localhost:9200/my_notes
# > {"settings":{"number_of_shards":1},"mappings":{ ... {"text":{"analyzer":"snowball","type":"string"}}}}}

Save the document with extra properties added by the serialize method:

repository.save(note)
# PUT http://localhost:9200/my_notes/my_note/1
# > {"id":1,"text":"Test","my_special_key":"my_special_stuff"}
{"_index"=>"my_notes", "_type"=>"my_note", "_id"=>"1", "_version"=>4, ... }

And deserialize it:

repository.find(1)
# ***** CUSTOM DESERIALIZE LOGIC KICKING IN... *****
<Note:0x007f9bd782b7a0 @attributes={... "my_special_key"=>"my_special_stuff"}>

A Custom Class

In most cases, though, you'll want to use a custom class for the repository, so let's do that:

require 'base64'

class NoteRepository
  include Elasticsearch::Persistence::Repository

  def initialize(options={})
    index  options[:index] || 'notes'
    client Elasticsearch::Client.new url: options[:url], log: options[:log]
  end

  klass Note

  settings number_of_shards: 1 do
    mapping do
      indexes :text,  analyzer: 'snowball'
      # Do not index images
      indexes :image, index: 'no'
    end
  end

  # Base64 encode the "image" field in the document
  #
  def serialize(document)
    hash = document.to_hash.clone
    hash['image'] = Base64.encode64(hash['image']) if hash['image']
    hash.to_hash
  end

  # Base64 decode the "image" field in the document
  #
  def deserialize(document)
    hash = document['_source']
    hash['image'] = Base64.decode64(hash['image']) if hash['image']
    klass.new hash
  end
end

Include the Elasticsearch::Persistence::Repository module to add the repository methods into the class.

You can customize the repository in the familiar way, by calling the DSL-like methods.

You can implement a custom initializer for your repository, add complex logic in its class and instance methods -- in general, have all the freedom of a standard Ruby class.

repository = NoteRepository.new url: 'http://localhost:9200', log: true

# Configure the repository instance
repository.index = 'notes_development'
repository.client.transport.logger.formatter = proc { |s, d, p, m| "\e[2m# #{m}\n\e[0m" }

repository.create_index! force: true

note = Note.new 'id' => 1, 'text' => 'Document with image', 'image' => '... BINARY DATA ...'

repository.save(note)
# PUT http://localhost:9200/notes_development/note/1
# > {"id":1,"text":"Document with image","image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}
puts repository.find(1).attributes['image']
# GET http://localhost:9200/notes_development/note/1
# < {... "_source" : { ... "image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}}
# => ... BINARY DATA ...

Methods Provided by the Repository

Client

The repository uses the standard Elasticsearch client, which is accessible with the client getter and setter methods:

repository.client = Elasticsearch::Client.new url: 'http://search.server.org'
repository.client.transport.logger = Logger.new(STDERR)
Naming

The index method specifies the Elasticsearch index to use for storage, lookup and search (when not set, the value is inferred from the repository class name):

repository.index = 'notes_development'

The type method specifies the Elasticsearch document type to use for storage, lookup and search (when not set, the value is inferred from the document class name, or _all is used):

repository.type = 'my_note'

The klass method specifies the Ruby class name to use when initializing objects from documents retrieved from the repository (when not set, the value is inferred from the document _type as fetched from Elasticsearch):

repository.klass = MyNote
Index Configuration

The settings and mappings methods, provided by the elasticsearch-model gem, allow to configure the index properties:

repository.settings number_of_shards: 1
repository.settings.to_hash
# => {:number_of_shards=>1}

repository.mappings { indexes :title, analyzer: 'snowball' }
repository.mappings.to_hash
# => { :note => {:properties=> ... }}

The convenience methods create_index!, delete_index! and refresh_index! allow you to manage the index lifecycle.

Serialization

The serialize and deserialize methods allow you to customize the serialization of the document when passing it to the storage, and the initialization procedure when loading it from the storage:

class NoteRepository
  def serialize(document)
    Hash[document.to_hash.map() { |k,v|  v.upcase! if k == :title; [k,v] }]
  end
  def deserialize(document)
    MyNote.new ActiveSupport::HashWithIndifferentAccess.new(document['_source']).deep_symbolize_keys
  end
end
Storage

The save method allows you to store a domain object in the repository:

note = Note.new id: 1, title: 'Quick Brown Fox'
repository.save(note)
# => {"_index"=>"notes_development", "_type"=>"my_note", "_id"=>"1", "_version"=>1, "created"=>true}

The update method allows you to perform a partial update of a document in the repository. Use either a partial document:

repository.update id: 1, title: 'UPDATED',  tags: []
# => {"_index"=>"notes_development", "_type"=>"note", "_id"=>"1", "_version"=>2}

Or a script (optionally with parameters):

repository.update 1, script: 'if (!ctx._source.tags.contains(t)) { ctx._source.tags += t }', params: { t: 'foo' }
# => {"_index"=>"notes_development", "_type"=>"note", "_id"=>"1", "_version"=>3}

The delete method allows to remove objects from the repository (pass either the object itself or its ID):

repository.delete(note)
repository.delete(1)
Finding

The find method allows to find one or many documents in the storage and returns them as deserialized Ruby objects:

repository.save Note.new(id: 2, title: 'Fast White Dog')

note = repository.find(1)
# => <MyNote ... QUICK BROWN FOX>

notes = repository.find(1, 2)
# => [<MyNote... QUICK BROWN FOX>, <MyNote ... FAST WHITE DOG>]

When the document with a specific ID isn't found, a nil is returned instead of the deserialized object:

notes = repository.find(1, 3, 2)
# => [<MyNote ...>, nil, <MyNote ...>]

Handle the missing objects in the application code, or call compact on the result.

Search

The search method to retrieve objects from the repository by a query string or definition in the Elasticsearch DSL:

repository.search('fox or dog').to_a
# GET http://localhost:9200/notes_development/my_note/_search?q=fox
# => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]

repository.search(query: { match: { title: 'fox dog' } }).to_a
# GET http://localhost:9200/notes_development/my_note/_search
# > {"query":{"match":{"title":"fox dog"}}}
# => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]

The returned object is an instance of the Elasticsearch::Persistence::Repository::Response::Results class, which provides access to the results, the full returned response and hits.

results = repository.search(query: { match: { title: 'fox dog' } })

# Iterate over the objects
#
results.each do |note|
  puts "* #{note.attributes[:title]}"
end
# * QUICK BROWN FOX
# * FAST WHITE DOG

# Iterate over the objects and hits
#
results.each_with_hit do |note, hit|
  puts "* #{note.attributes[:title]}, score: #{hit._score}"
end
# * QUICK BROWN FOX, score: 0.29930896
# * FAST WHITE DOG, score: 0.29930896

# Get total results
#
results.total
# => 2

# Access the raw response as a Hashie::Mash instance
results.response._shards.failed
# => 0

Example Application

An example Sinatra application is available in examples/sinatra/application.rb, and demonstrates a rich set of features of the repository.

The ActiveRecord Pattern

Work in progress. The ActiveRecord pattern will work in a very similar way as Tire::Model::Persistence, allowing a drop-in replacement of an Elasticsearch-backed model in Ruby on Rails applications.

License

This software is licensed under the Apache 2 license, quoted below.

Copyright (c) 2014 Elasticsearch <http://www.elasticsearch.org>

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.