Persistence layer for Ruby domain objects in Elasticsearch, using the Repository and ActiveRecord patterns.
The library is compatible with Ruby 1.9.3 (or higher) and Elasticsearch 1.0 (or higher).
Install the package from Rubygems:
gem install elasticsearch-persistence
To use an unreleased version, either add it to your Gemfile
for Bundler:
gem 'elasticsearch-persistence', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
or install it from a source code checkout:
git clone https://github.com/elasticsearch/elasticsearch-rails.git
cd elasticsearch-rails/elasticsearch-persistence
bundle install
rake install
The Elasticsearch::Persistence::Repository
module provides an implementation of the
repository pattern and allows
to save, delete, find and search objects stored in Elasticsearch, as well as configure
mappings and settings for the index.
Let's have a simple plain old Ruby object (PORO):
class Note
attr_reader :attributes
def initialize(attributes={})
@attributes = attributes
end
def to_hash
@attributes
end
end
Let's create a default, "dumb" repository, as a first step:
require 'elasticsearch/persistence'
repository = Elasticsearch::Persistence::Repository.new
We can save a Note
instance into the repository...
note = Note.new id: 1, text: 'Test'
repository.save(note)
# PUT http://localhost:9200/repository/note/1 [status:201, request:0.210s, query:n/a]
# > {"id":1,"text":"Test"}
# < {"_index":"repository","_type":"note","_id":"1","_version":1,"created":true}
...find it...
n = repository.find(1)
# GET http://localhost:9200/repository/_all/1 [status:200, request:0.003s, query:n/a]
# < {"_index":"repository","_type":"note","_id":"1","_version":2,"found":true, "_source" : {"id":1,"text":"Test"}}
=> <Note:0x007fcbfc0c4980 @attributes={"id"=>1, "text"=>"Test"}>
...search for it...
repository.search(query: { match: { text: 'test' } }).first
# GET http://localhost:9200/repository/_search [status:200, request:0.005s, query:0.002s]
# > {"query":{"match":{"text":"test"}}}
# < {"took":2, ... "hits":{"total":1, ... "hits":[{ ... "_source" : {"id":1,"text":"Test"}}]}}
=> <Note:0x007fcbfc1c7b70 @attributes={"id"=>1, "text"=>"Test"}>
...or delete it:
repository.delete(note)
# DELETE http://localhost:9200/repository/note/1 [status:200, request:0.014s, query:n/a]
# < {"found":true,"_index":"repository","_type":"note","_id":"1","_version":3}
=> {"found"=>true, "_index"=>"repository", "_type"=>"note", "_id"=>"1", "_version"=>2}
The repository module provides a number of features and facilities to configure and customize the behaviour:
- Configuring the Elasticsearch client being used
- Setting the index name, document type, and object class for deserialization
- Composing mappings and settings for the index
- Creating, deleting or refreshing the index
- Finding or searching for documents
- Providing access both to domain objects and hits for search results
- Providing access to the Elasticsearch response for search results (aggregations, total, ...)
- Defining the methods for serialization and deserialization
You can use the default repository class, or include the module in your own. Let's review it in detail.
For simple cases, you can use the default, bundled repository class, and configure/customize it:
repository = Elasticsearch::Persistence::Repository.new do
# Configure the Elasticsearch client
client Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'], log: true
# Set a custom index name
index :my_notes
# Set a custom document type
type :my_note
# Specify the class to inicialize when deserializing documents
klass Note
# Configure the settings and mappings for the Elasticsearch index
settings number_of_shards: 1 do
mapping do
indexes :text, analyzer: 'snowball'
end
end
# Customize the serialization logic
def serialize(document)
super.merge(my_special_key: 'my_special_stuff')
end
# Customize the de-serialization logic
def deserialize(document)
puts "# ***** CUSTOM DESERIALIZE LOGIC KICKING IN... *****"
super
end
end
The custom Elasticsearch client will be used now, with a custom index and type names, as well as the custom serialization and de-serialization logic.
We can create the index with the desired settings and mappings:
repository.create_index! force: true
# PUT http://localhost:9200/my_notes
# > {"settings":{"number_of_shards":1},"mappings":{ ... {"text":{"analyzer":"snowball","type":"string"}}}}}
Save the document with extra properties added by the serialize
method:
repository.save(note)
# PUT http://localhost:9200/my_notes/my_note/1
# > {"id":1,"text":"Test","my_special_key":"my_special_stuff"}
{"_index"=>"my_notes", "_type"=>"my_note", "_id"=>"1", "_version"=>4, ... }
And deserialize
it:
repository.find(1)
# ***** CUSTOM DESERIALIZE LOGIC KICKING IN... *****
<Note:0x007f9bd782b7a0 @attributes={... "my_special_key"=>"my_special_stuff"}>
In most cases, though, you'll want to use a custom class for the repository, so let's do that:
require 'base64'
class NoteRepository
include Elasticsearch::Persistence::Repository
def initialize(options={})
index options[:index] || 'notes'
client Elasticsearch::Client.new url: options[:url], log: options[:log]
end
klass Note
settings number_of_shards: 1 do
mapping do
indexes :text, analyzer: 'snowball'
# Do not index images
indexes :image, index: 'no'
end
end
# Base64 encode the "image" field in the document
#
def serialize(document)
hash = document.to_hash.clone
hash['image'] = Base64.encode64(hash['image']) if hash['image']
hash.to_hash
end
# Base64 decode the "image" field in the document
#
def deserialize(document)
hash = document['_source']
hash['image'] = Base64.decode64(hash['image']) if hash['image']
klass.new hash
end
end
Include the Elasticsearch::Persistence::Repository
module to add the repository methods into the class.
You can customize the repository in the familiar way, by calling the DSL-like methods.
You can implement a custom initializer for your repository, add complex logic in its class and instance methods -- in general, have all the freedom of a standard Ruby class.
repository = NoteRepository.new url: 'http://localhost:9200', log: true
# Configure the repository instance
repository.index = 'notes_development'
repository.client.transport.logger.formatter = proc { |s, d, p, m| "\e[2m# #{m}\n\e[0m" }
repository.create_index! force: true
note = Note.new 'id' => 1, 'text' => 'Document with image', 'image' => '... BINARY DATA ...'
repository.save(note)
# PUT http://localhost:9200/notes_development/note/1
# > {"id":1,"text":"Document with image","image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}
puts repository.find(1).attributes['image']
# GET http://localhost:9200/notes_development/note/1
# < {... "_source" : { ... "image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}}
# => ... BINARY DATA ...
The repository uses the standard Elasticsearch client,
which is accessible with the client
getter and setter methods:
repository.client = Elasticsearch::Client.new url: 'http://search.server.org'
repository.client.transport.logger = Logger.new(STDERR)
The index
method specifies the Elasticsearch index to use for storage, lookup and search
(when not set, the value is inferred from the repository class name):
repository.index = 'notes_development'
The type
method specifies the Elasticsearch document type to use for storage, lookup and search
(when not set, the value is inferred from the document class name, or _all
is used):
repository.type = 'my_note'
The klass
method specifies the Ruby class name to use when initializing objects from
documents retrieved from the repository (when not set, the value is inferred from the
document _type
as fetched from Elasticsearch):
repository.klass = MyNote
The settings
and mappings
methods, provided by the
elasticsearch-model
gem, allow to configure the index properties:
repository.settings number_of_shards: 1
repository.settings.to_hash
# => {:number_of_shards=>1}
repository.mappings { indexes :title, analyzer: 'snowball' }
repository.mappings.to_hash
# => { :note => {:properties=> ... }}
The convenience methods create_index!
, delete_index!
and refresh_index!
allow you to manage the index lifecycle.
The serialize
and deserialize
methods allow you to customize the serialization of the document when passing it
to the storage, and the initialization procedure when loading it from the storage:
class NoteRepository
def serialize(document)
Hash[document.to_hash.map() { |k,v| v.upcase! if k == :title; [k,v] }]
end
def deserialize(document)
MyNote.new ActiveSupport::HashWithIndifferentAccess.new(document['_source']).deep_symbolize_keys
end
end
The save
method allows you to store a domain object in the repository:
note = Note.new id: 1, title: 'Quick Brown Fox'
repository.save(note)
# => {"_index"=>"notes_development", "_type"=>"my_note", "_id"=>"1", "_version"=>1, "created"=>true}
The update
method allows you to perform a partial update of a document in the repository.
Use either a partial document:
repository.update id: 1, title: 'UPDATED', tags: []
# => {"_index"=>"notes_development", "_type"=>"note", "_id"=>"1", "_version"=>2}
Or a script (optionally with parameters):
repository.update 1, script: 'if (!ctx._source.tags.contains(t)) { ctx._source.tags += t }', params: { t: 'foo' }
# => {"_index"=>"notes_development", "_type"=>"note", "_id"=>"1", "_version"=>3}
The delete
method allows to remove objects from the repository (pass either the object itself or its ID):
repository.delete(note)
repository.delete(1)
The find
method allows to find one or many documents in the storage and returns them as deserialized Ruby objects:
repository.save Note.new(id: 2, title: 'Fast White Dog')
note = repository.find(1)
# => <MyNote ... QUICK BROWN FOX>
notes = repository.find(1, 2)
# => [<MyNote... QUICK BROWN FOX>, <MyNote ... FAST WHITE DOG>]
When the document with a specific ID isn't found, a nil
is returned instead of the deserialized object:
notes = repository.find(1, 3, 2)
# => [<MyNote ...>, nil, <MyNote ...>]
Handle the missing objects in the application code, or call compact
on the result.
The search
method to retrieve objects from the repository by a query string or definition in the Elasticsearch DSL:
repository.search('fox or dog').to_a
# GET http://localhost:9200/notes_development/my_note/_search?q=fox
# => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]
repository.search(query: { match: { title: 'fox dog' } }).to_a
# GET http://localhost:9200/notes_development/my_note/_search
# > {"query":{"match":{"title":"fox dog"}}}
# => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]
The returned object is an instance of the Elasticsearch::Persistence::Repository::Response::Results
class,
which provides access to the results, the full returned response and hits.
results = repository.search(query: { match: { title: 'fox dog' } })
# Iterate over the objects
#
results.each do |note|
puts "* #{note.attributes[:title]}"
end
# * QUICK BROWN FOX
# * FAST WHITE DOG
# Iterate over the objects and hits
#
results.each_with_hit do |note, hit|
puts "* #{note.attributes[:title]}, score: #{hit._score}"
end
# * QUICK BROWN FOX, score: 0.29930896
# * FAST WHITE DOG, score: 0.29930896
# Get total results
#
results.total
# => 2
# Access the raw response as a Hashie::Mash instance
results.response._shards.failed
# => 0
An example Sinatra application is available in
examples/sinatra/application.rb
,
and demonstrates a rich set of features of the repository.
Work in progress.
The ActiveRecord pattern will work
in a very similar way as Tire::Model::Persistence
, allowing a drop-in replacement of
an Elasticsearch-backed model in Ruby on Rails applications.
This software is licensed under the Apache 2 license, quoted below.
Copyright (c) 2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.