-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom analyzer not working as expected #697
Comments
Hi, is this in the context of search for Rubygems? I think it's better to try this out with an isolated example, and then try to integrate it into the example application. What I've come up with is this Ruby code, I'll add it to the # Custom Analyzer for ActiveRecord integration with Elasticsearch
# ===============================================================
$LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)
require 'ansi'
require 'logger'
require 'active_record'
require 'elasticsearch/model'
ActiveRecord::Base.logger = ActiveSupport::Logger.new(STDOUT)
ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ":memory:" )
ActiveRecord::Schema.define(version: 1) do
create_table :articles do |t|
t.string :title
t.date :published_at
t.timestamps
end
end
Elasticsearch::Model.client.transport.logger = ActiveSupport::Logger.new(STDOUT)
Elasticsearch::Model.client.transport.logger.formatter = lambda { |s, d, p, m| "#{m.ansi(:faint)}\n" }
class Article < ActiveRecord::Base
include Elasticsearch::Model
settings index: {
number_of_shards: 1,
number_of_replicas: 0,
analysis: {
analyzer: {
rubygem: {
type: 'pattern',
pattern: "_",
lowercase: true
}
}
} } do
mapping do
indexes :title, type: 'text' do
indexes :keyword, analyzer: 'keyword'
indexes :rubygem, analyzer: 'rubygem'
end
end
end
end
# Create example records
#
Article.delete_all
Article.create title: 'Foo'
Article.create title: 'Foo_Bar_Baz'
Article.create title: 'Bar'
# Index records
#
errors = Article.import force: true, refresh: true, return: 'errors'
puts "[!] Errors importing records: #{errors.map { |d| d['index']['error'] }.join(', ')}".ansi(:red) && exit(1) unless errors.empty?
puts '', '-'*80
puts "Fulltext analyzer [Foo_Bar_1]:".ansi(:bold),
Article.__elasticsearch__.client.indices
.analyze(index: Article.index_name, field: 'title', text: 'Foo_Bar_1')['tokens']
.map { |d| d['token'] }.join(', '),
"\n"
puts "Keyword analyzer [Foo_Bar_1]:".ansi(:bold),
Article.__elasticsearch__.client.indices
.analyze(index: Article.index_name, field: 'title.keyword', text: 'Foo_Bar_1')['tokens']
.map { |d| d['token'] }.join(', '),
"\n"
puts "Rubygem analyzer [Foo_Bar_1]:".ansi(:bold),
Article.__elasticsearch__.client.indices
.analyze(index: Article.index_name, field: 'title.rubygem', text: 'Foo_Bar_1')['tokens']
.map { |d| d['token'] }.join(', '),
"\n"
puts '', '-'*80
response = Article.search 'foo';
puts "Simple search for 'foo':".ansi(:bold),
response.records.map { |d| d.title }.inspect,
"\n"
puts '', '-'*80
response = Article.search query: { match: { 'title.rubygem' => 'foo' } } ;
puts "Rubygem search for 'foo':".ansi(:bold),
response.records.map { |d| d.title }.inspect,
"\n"
puts '', '-'*80
require 'pry'; binding.pry; Can you try this example? Can you post here examples of values, and how you want to have them analyzed? I'm not 100% sure what is the intention with your original pattern, |
Yes. It is also my explaination for using Thank you so much for your example 💙 . I have modified your script to better illustrate my point (See script).
So, fulltext analyzer can tokenize over hyphens but not underscores? I was under the impression that if I use same index name ( |
Right! Ping me if you'd like to get some help, preferably by opening an issue in the rubygems/rubygems.org repository, and mentioning me. In the past, we have been working with David Radcliffe on the initial implementation.
Yes, that's what I understood, I was confused by the escaping and the whitespace character there. I've checked it now, and it works the same as my example, though I like the
Yes, correct, of course depends on the specific analyzer, the default one is "Standard Analyzer".
Ha, that's actually a bug in the indexes :title, type: 'text', analyzer: 'snowball' do
indexes :tokenized, analyzer: 'simple'
end That syntax is probably a left-over from the old "multi_field" type of mappings, I'll have to revisit that application template. Please let me know if you have more questions. |
Thanks. We just need to update ES 5 before we can enable it on rubygems.org.
I would really appreciate it if you could go over our implementation and test when you can find some time. Thanks again for helping me figure this out |
I'll try to do that, do @-mention me whenever you would feel like getting some feedback on the Rubygems.org repo. I'll try to ping @dwradcliffe on the Slack channel we had back then, to see if we can continue expanding the search features. |
Steps to reproduce:
03-expert.rb
templateexample_1
,example_gem
example
.I don't get any results back. My pattern analyzer seems to be working fine:
Any kind of help or general guidance about debugging this would be really appreciated :) I am using elasticsearch 5.3.
The text was updated successfully, but these errors were encountered: