Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't access aggregations correctly on Elasticsearch::Model::Response #568

Closed
schovi opened this issue May 21, 2016 · 6 comments
Closed

Comments

@schovi
Copy link

schovi commented May 21, 2016

Original issue: elastic/elasticsearch-ruby#306

I am having single query with 2 nested aggregations

search do
  query do
    match_all({})
  end

  aggregation :price do
    nested do
      path "price"

      aggregation :min do
        min field: "price.price"
      end

      aggregation :max do
        max field: "price.price"
      end
    end
  end
end

Calling elastic and getting result directly works perfectly, but when using ruby api it behave really strange:

[8] pry(main)> print Search.search.to_json
{"query":{"match_all":{}},"aggregations":{"price":{"nested":{"path":"price"},"aggregations":{"min":{"min":{"field":"price.price"}},"max":{"max":{"field":"price.price"}}}}},"size":0}
[9] pry(main)> result = Advert.search(Search.search) # Calling search on ActiveRecord model
=> #<Elasticsearch::Model::Response::Response:0x007f81aa537518 ...>
[10] pry(main)> result.aggregations
=> {"price"=>{"doc_count"=>9428, "min"=>{"value"=>1.0}, "max"=>{"value"=>260000.0}}}
[11] pry(main)> result.aggregations.price
=> {"doc_count"=>9428, "min"=>{"value"=>1.0}, "max"=>{"value"=>260000.0}}
[12] pry(main)> result.aggregations.price.min
=> ["doc_count", 9428]
[13] pry(main)> result.aggregations.price.max
=> ["min", {"value"=>1.0}]
[14] pry(main)> result.aggregations.price.wtf
=> nil

I see 2 problems there

  1. aggregations.price.max returns content of aggregation min
  2. it returns array ["min", {"value" => 1.0}] where I expect another hash

I was able to trace it to Hashie library, where similiar behaviour with returning array is https://github.com/intridea/hashie#methodaccesswithoverride but there my struggle ends.

Setup Ruby 2.3, Rails 5.0.0.beta3, sidekiq 4 and nothing more special

@schovi
Copy link
Author

schovi commented May 21, 2016

There is working playground https://gist.github.com/schovi/96755a56fe797c1530363d82a14be688 just put it in any rails project and run rake elastic:test
While debugging this I found solution :). min and max are methods on Hashie::Mash object

It seems you use Hashie library only for easily accessible Hash? This library seems pretty large and complex just for one thing. Why not to use ActiveSupport::HashWithIndifferentAccess when we are already in rails or what I personally prefer something like: https://github.com/aetherknight/recursive-open-struct/?

@karmi
Copy link
Contributor

karmi commented May 23, 2016

Nice catch! :)

The underlying problem is probably because min and max are defined in Enumerable, so it affects any Hash as well:

h = { foo: 'bar' }
# => {:foo=>"bar"}
h.min
# => [:foo, "bar"]
h.max
# => [:foo, "bar"]

I'll have a look into it.

@schovi
Copy link
Author

schovi commented May 23, 2016

@karmi: Same problem with HashWithIndifferentAccess or any other Hash like object. Worst part on that is that anybody can monkey patch Enumerable or/and any class and module and then something will suddenly stop working.

Example with same problem on HashWithIndifferentAccess and then all "unavailable properties" on Hashie::Mash instance.

[2] pry(main)> x = HashWithIndifferentAccess.new({a: 1, min: 2, max: 3, z: 4})
=> {"a"=>1, "min"=>2, "max"=>3, "z"=>4}
[3] pry(main)> x.min
=> ["a", 1]
[7] pry(main)> x = Hashie::Mash.new({a: 1, min: 2, max: 3, z: 4})
=> {"a"=>1, "min"=>2, "max"=>3, "z"=>4}
[8] pry(main)> ls x
Enumerable#methods:
  all?            count       each_cons         entries     first     index_by  max     minmax_by  reduce        sort        to_set
  chunk           cycle       each_entry        exclude?    flat_map  inject    max_by  none?      reverse_each  sort_by     without
  chunk_while     detect      each_slice        find        grep      lazy      min     one?       slice_after   sum         zip
  collect         drop        each_with_index   find_all    grep_v    many?     min_by  partition  slice_before  take
  collect_concat  drop_while  each_with_object  find_index  group_by  map       minmax  pluck      slice_when    take_while
V8::Conversion::Hash#methods: to_v8
HashRecursiveMerge#methods: rmerge  rmerge!
Hash#methods:
  <                    compare_by_identity?  each          invert                           select           to_proc
  <=                   deep_dup              each_key      keep_if                          select!          to_query
  ==                   deep_stringify_keys   each_pair     key                              shift            to_xml
  >                    deep_stringify_keys!  each_value    keys                             size             transform_keys
  >=                   deep_symbolize_keys   empty?        length                           slice            transform_keys!
  any?                 deep_symbolize_keys!  eql?          nested_under_indifferent_access  slice!           transform_values
  as_json              deep_transform_keys   except        pretty_print                     store            transform_values!
  assert_valid_keys    deep_transform_keys!  except!       pretty_print_cycle               symbolize_keys   value?
  assoc                default               extract!      rassoc                           symbolize_keys!  values
  blank?               default=              fetch_values  rehash                           to_a             with_indifferent_access
  clear                default_proc          flatten       reject                           to_h
  compact              default_proc=         has_value?    reject!                          to_options
  compact!             delete_if             hash          reverse_merge!                   to_options!
  compare_by_identity  dig                   index         reverse_update                   to_param
Hashie::Extensions::PrettyInspect#methods: hashie_inspect
Hashie::Extensions::StringifyKeys#methods: stringify_keys  stringify_keys!
Hashie::Hash#methods: to_hash  to_json  to_mash
Hashie::Mash#methods:
  []               deep_merge   extractable_options?  initializing_reader  merge!          regular_writer  to_module
  []=              deep_merge!  fetch                 inspect              method_missing  replace         to_s
  assign_property  deep_update  has_key?              key?                 prefix_method?  reverse_merge   underbang_reader
  custom_reader    delete       hash_inspect          member?              regular_dup     shallow_merge   update
  custom_writer    dup          include?              merge                regular_reader  shallow_update  values_at

Structs or custom class seems like only solution.

@karmi
Copy link
Contributor

karmi commented May 23, 2016

Yeah, it's bad.

I started playing with it, but the whole thing of course screams "recursion", since we cannot just wrap the aggregations Hash, we have to dive deep into every value. Of course, we cannot re-define Hashie::Max#min or Enumerable#min.

I'm not hell-bent on Hashie::Mash, but it's been used throughout the codebase already, so any replacement will require a larger refactoring.

At the moment, it seems to me like the former approach, ie. wrapping or enhancing the values of response['aggregations'], is more promising.

@karmi
Copy link
Contributor

karmi commented May 23, 2016

So, I have looked into it, and seems like we can simply recursively traverse the values of the response['aggregations'] Hash. Since the min and max names are very common and an obvious choice for an aggregation name, it makes sense to make some effort here.

I've added a dedicated Elasticsearch::Model::Response::Aggregations class, which can have its uses further down the line.

I've added a unit test for the behaviour, and amended the integration test.

Thanks for the report and for all the debugging information, @schovi!

Closed in 6d10abb

@karmi karmi closed this as completed May 23, 2016
@karmi
Copy link
Contributor

karmi commented May 23, 2016

Github seems to be eating the changes, so another try.

I've created a wrapper class Elasticsearch::Model::Response::Aggregations which recursively traverses the aggregations Hash, and re-defines the min and max methods. These are common, and in many cases an obvious choice for aggregation names, so it makes sense to do it, I think.

I've added a unit test, and update the integration test suite. Closed in commit 6d10abb.

karmi added a commit that referenced this issue May 23, 2016
…unexpected values

While people can define names for aggregations such as `min` and `max`, these
methods are defined in `Enumerable#min` and `Enumerable#max`, and return
an unexpected value:

    result.aggregations
    # => {"price"=>{"doc_count"=>9428, "min"=>{"value"=>1.0}, "max"=>{"value"=>260000.0}}}
    result.aggregations.price
    # => {"doc_count"=>9428, "min"=>{"value"=>1.0}, "max"=>{"value"=>260000.0}}
    result.aggregations.price.min
    # => ["doc_count", 9428]

Therefore, the whole Hash with aggregations is being traversed, in a wrapper class,
and a `min` and `max` method is defined for Hashie::Mash instances,
because these names are too common.

The wrapper class can have its uses further down the line, so a dedicated
unit test file has been added as well.

Closes #568

Related: elastic/elasticsearch-ruby#306
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants