Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom analyzer not working as expected #697

Closed
sonalkr132 opened this issue Apr 9, 2017 · 5 comments
Closed

Custom analyzer not working as expected #697

sonalkr132 opened this issue Apr 9, 2017 · 5 comments
Labels

Comments

@sonalkr132
Copy link

Steps to reproduce:

  • Set up 03-expert.rb template
$ rails new searchapp --skip --skip-bundle --template https://raw.github.com/elasticsearch/elasticsearch-rails/master/elasticsearch-rails/lib/rails/templates/01-basic.rb
$ rails new searchapp --skip --skip-bundle --template https://raw.github.com/elasticsearch/elasticsearch-rails/master/elasticsearch-rails/lib/rails/templates/02-pretty.rb
$ rails new searchapp --skip --skip-bundle --template https://raw.github.com/elasticsearch/elasticsearch-rails/master/elasticsearch-rails/lib/rails/templates/03-expert.rb
  • Add a custom analyzer to article title:
-    settings index: { number_of_shards: 1, number_of_replicas: 0 } do
+    settings index: { number_of_shards: 1, number_of_replicas: 0, analysis: {
+               analyzer: {
+                 rubygem: {
+                   type: 'pattern',
+                   pattern: "[\s#{Regexp.escape('.-_')}]+"
+                 }
+               }
+             } } do
       mapping do
         indexes :title, type: 'text' do
-          indexes :title,     analyzer: 'snowball'
+          indexes :title,     analyzer: 'rubygem'
           indexes :tokenized, analyzer: 'simple'
         end
  • Reimport records
$ rake environment elasticsearch:import:model CLASS='Article' FORCE=y
  • Start rails server and create a few articles with title as: example_1, example_gem
  • Search for example.

I don't get any results back. My pattern analyzer seems to be working fine:

$ curl -XGET 'localhost:9200/searchapp_application_development/_analyze?analyzer=rubygem&pretty=true' -d 'example_1'
{
  "tokens" : [
    {
      "token" : "example",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "1",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "word",
      "position" : 1
    }
  ]
}

Any kind of help or general guidance about debugging this would be really appreciated :) I am using elasticsearch 5.3.

@karmi
Copy link
Contributor

karmi commented Apr 10, 2017

Hi, is this in the context of search for Rubygems?

I think it's better to try this out with an isolated example, and then try to integrate it into the example application.

What I've come up with is this Ruby code, I'll add it to the examples folder of the repository later:

# Custom Analyzer for ActiveRecord integration with Elasticsearch
# ===============================================================

$LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)

require 'ansi'
require 'logger'

require 'active_record'
require 'elasticsearch/model'

ActiveRecord::Base.logger = ActiveSupport::Logger.new(STDOUT)
ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ":memory:" )

ActiveRecord::Schema.define(version: 1) do
  create_table :articles do |t|
    t.string :title
    t.date    :published_at
    t.timestamps
  end
end

Elasticsearch::Model.client.transport.logger = ActiveSupport::Logger.new(STDOUT)
Elasticsearch::Model.client.transport.logger.formatter = lambda { |s, d, p, m| "#{m.ansi(:faint)}\n" }

class Article < ActiveRecord::Base
  include Elasticsearch::Model

  settings index: {
    number_of_shards: 1,
    number_of_replicas: 0,
    analysis: {
      analyzer: {
        rubygem: {
          type: 'pattern',
          pattern: "_",
          lowercase: true
        }
      }
    } } do
    mapping do
      indexes :title, type: 'text' do
        indexes :keyword, analyzer: 'keyword'
        indexes :rubygem, analyzer: 'rubygem'
      end
    end
  end
end

# Create example records
#
Article.delete_all
Article.create title: 'Foo'
Article.create title: 'Foo_Bar_Baz'
Article.create title: 'Bar'

# Index records
#
errors = Article.import force: true, refresh: true, return: 'errors'
puts "[!] Errors importing records: #{errors.map { |d| d['index']['error'] }.join(', ')}".ansi(:red) && exit(1) unless errors.empty?

puts '', '-'*80

puts "Fulltext analyzer [Foo_Bar_1]:".ansi(:bold),
     Article.__elasticsearch__.client.indices
      .analyze(index: Article.index_name, field: 'title', text: 'Foo_Bar_1')['tokens']
      .map { |d| d['token'] }.join(', '),
    "\n"

puts "Keyword analyzer [Foo_Bar_1]:".ansi(:bold),
     Article.__elasticsearch__.client.indices
      .analyze(index: Article.index_name, field: 'title.keyword', text: 'Foo_Bar_1')['tokens']
      .map { |d| d['token'] }.join(', '),
     "\n"

puts "Rubygem analyzer [Foo_Bar_1]:".ansi(:bold),
     Article.__elasticsearch__.client.indices
      .analyze(index: Article.index_name, field: 'title.rubygem', text: 'Foo_Bar_1')['tokens']
      .map { |d| d['token'] }.join(', '),
     "\n"

puts '', '-'*80

response = Article.search 'foo';

puts "Simple search for 'foo':".ansi(:bold),
     response.records.map { |d| d.title }.inspect,
     "\n"

puts '', '-'*80

response = Article.search query: { match: { 'title.rubygem' => 'foo' } } ;

puts "Rubygem search for 'foo':".ansi(:bold),
     response.records.map { |d| d.title }.inspect,
     "\n"

puts '', '-'*80

require 'pry'; binding.pry;

Can you try this example? Can you post here examples of values, and how you want to have them analyzed? I'm not 100% sure what is the intention with your original pattern, [\s#{Regexp.escape('.-_')}]+.

@sonalkr132
Copy link
Author

sonalkr132 commented Apr 10, 2017

is this in the context of search for Rubygems?

Yes. It is also my explaination for using [\s#{Regexp.escape('.-_')}]+ . I want to search rubygem name and '.-_' is where names split. For example, if I query elasticsearch, I should get back gems named elasticsearch_rails, elasticsearch-rails, elasticsearch_rails.rb etc.

Thank you so much for your example 💙 . I have modified your script to better illustrate my point (See script).
I am miffed over following output (See complete output):

Fulltext analyzer [Foo_Bar_1]:
foo_bar_1
Fulltext analyzer [Foo-Bar-1]:
foo, bar, 1

Search `title` for 'foo':
["Foo", "Foo-Bar-Baz"]
Search `title.title` for 'foo':
["Foo", "Foo_Bar_Baz", "Foo-Bar-Baz"]

So, fulltext analyzer can tokenize over hyphens but not underscores? I was under the impression that if I use same index name (title) when specifying the analyzer, the analyzed field will be used in my query.
Apparently, title uses fulltext analyzer and I should be using title.title if I want to use the analyzed field?

@karmi
Copy link
Contributor

karmi commented Apr 12, 2017

is this in the context of search for Rubygems?

Yes.

Right! Ping me if you'd like to get some help, preferably by opening an issue in the rubygems/rubygems.org repository, and mentioning me. In the past, we have been working with David Radcliffe on the initial implementation.

It is also my explaination for using [\s#{Regexp.escape('.-')}]+ . I want to search rubygem name and '.-' is where names split.

Yes, that's what I understood, I was confused by the escaping and the whitespace character there. I've checked it now, and it works the same as my example, though I like the "_|-|\\." version better, the | ("or") character seems to express the intent better to me. I'm not sure if whitespace is permitted in Rubygem names?, but it doesn't hurt to add it.

So, fulltext analyzer can tokenize over hyphens but not underscores?

Yes, correct, of course depends on the specific analyzer, the default one is "Standard Analyzer".

I was under the impression that if I use same index name (title) when specifying the analyzer
Apparently, title uses fulltext analyzer and I should be using title.title if I want to use the analyzed field?

Ha, that's actually a bug in the searchable.rb file, I think the intention there is to set the snowball analyzer as default for the title property. If so, it should be written like this:

indexes :title, type: 'text', analyzer: 'snowball' do
  indexes :tokenized, analyzer: 'simple'
end

That syntax is probably a left-over from the old "multi_field" type of mappings, I'll have to revisit that application template.

Please let me know if you have more questions.

@sonalkr132
Copy link
Author

Ping me if you'd like to get some help

Thanks. We just need to update ES 5 before we can enable it on rubygems.org.

Please let me know if you have more questions.

I would really appreciate it if you could go over our implementation and test when you can find some time.

Thanks again for helping me figure this out ☺️ I think it is safe to close this.

@karmi
Copy link
Contributor

karmi commented Apr 13, 2017

I'll try to do that, do @-mention me whenever you would feel like getting some feedback on the Rubygems.org repo. I'll try to ping @dwradcliffe on the Slack channel we had back then, to see if we can continue expanding the search features.

karmi added a commit that referenced this issue May 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants