Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom transformations to be defined for bulk imports. #113

Closed
wants to merge 1 commit into from
Closed

Allow custom transformations to be defined for bulk imports. #113

wants to merge 1 commit into from

Conversation

AaronRustad
Copy link
Contributor

This allows for the customization of bulk imports. This was driven out of need to include the parent in the index payload, though could extend to any bulk api options.

transform = lambda do |article|
  {index: {_id: article.id, _parent: article.author_id, data: article.__elasticsearch__.as_indexed_json}}
end

Article.import transform: transform

This functionality doesn't extend to the rake import task. Will address in a separate PR.

end

should "call the transform proc when the transform option is set" do
transform = stub("lambda")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use an empty lambda here, to demonstrate the point :)

@AaronRustad
Copy link
Contributor Author

@karmi requested changes applied.

@ryansch
Copy link
Contributor

ryansch commented May 10, 2014

Don't forget about the mongoid adapter.

@karmi
Copy link
Contributor

karmi commented May 10, 2014

@ryansch Grr, right, embarrasing! :)

@AaronRustad So Ryan is right -- and the correct place to transform the batch is in the Elasticsearch::Model::Importing::ClassMethods.import. Now, I think we need to change the chain a bit, so __find_in_batches (specific for each adapter) returns just a collection of model instances, and the transformation to the Elasticsearch payload happens in the (generic) import method. Sounds OK?

@AaronRustad
Copy link
Contributor Author

Makes sense. I'll tackle it in a couple hours.

@karmi
Copy link
Contributor

karmi commented May 10, 2014

No stress, Aaron, will be able to get to it only in the second part of the next week anyways. Thanks for keeping the perseverance! :)

@AaronRustad
Copy link
Contributor Author

hehe... the stress is on my side. I need it! ;-)

@AaronRustad
Copy link
Contributor Author

OK, so I've made some improvements based on your suggestions. I didn't even think to look at the impact on Mongoid, so thanks to @ryansch for noticing.

I've largely pushed the previous code I added up to Importing#import so that it can be used consistently between Mongoid, ActiveRecord, and any future adapters that may be introduced.

I've also introduced Importing#__transform that is intended to return a default transformation proc. Mongoid & ActiveRecord already differ slightly and I would expect that future adapters would too. The user of course has the option of overriding the default transform, and the API defined in the comments of my original PR remain the same.

Thanks!

@ryansch
Copy link
Contributor

ryansch commented May 13, 2014

Why does the AR version call model.__elasticsearch__.as_indexed_json but the mongoid version calls a.as_indexed_json? It seems like they should both access the #as_indexed_json method the same way.

@AaronRustad
Copy link
Contributor Author

I didn't dig into it, but the current code in master for the Mongoid adapter does that:

https://github.com/elasticsearch/elasticsearch-rails/blob/master/elasticsearch-model/lib/elasticsearch/model/adapters/mongoid.rb#L73

@ryansch
Copy link
Contributor

ryansch commented May 13, 2014

Agreed. Sorry I meant that question for @karmi and should have mentioned him in it.

@karmi
Copy link
Contributor

karmi commented May 14, 2014

@ryansch Might be just some oversight on my side, which slips through because of all the method proxying -- I'll have a look at that, thanks!

@AaronRustad
Copy link
Contributor Author

Is there anything I can do to help out on this one?

@karmi
Copy link
Contributor

karmi commented May 20, 2014

Aaron, I need to find time to evaluate the code, and merge it... Is it all 100% done on your side? Would like to get to it this week.

@AaronRustad
Copy link
Contributor Author

It's complete as far as what I wanted to add, but you've been pretty good at identifying improvements 😉 . @ryansch brought up the question as to why the Mongoid adapter calls as_indexed_json through __elasticsearch__. I think that's a quick fix though.

@karmi
Copy link
Contributor

karmi commented May 20, 2014

Yeah, that's a problem in the Mongoid adapter, let's keep that separate for fixing. I'll look into it soon, promise.

@karmi
Copy link
Contributor

karmi commented May 21, 2014

@AaronRustad Merged into master, thanks!

BTW, I don't think we should support this in the Rake tasks? Let's have people write 5 lines of Ruby, when they needed. The Rake task would need some environment variable, eval it, ... bleh. Not worth it in my opinion.

@AaronRustad
Copy link
Contributor Author

You could be right. The included Rake task works nicely with the progress bars. There may be an expectation that the rake task works similarly with the new option.

@karmi
Copy link
Contributor

karmi commented May 21, 2014

... yeah, but then again, passing a lambda as a string on the command line, to be evaled? :) I like to think that the gem provides a very comfy infrastructure for people to write their own code, when needed. The progress bar itself is three lines of Ruby, etc etc etc.

I don't think we should aim to solve every imaginable requirement -- much the same thing applies to the default callbacks.

@AaronRustad
Copy link
Contributor Author

Fair enough. 👍

@miguelff
Copy link
Contributor

Changes in this PR introduced failures in tests, please check https://travis-ci.org/elastic/elasticsearch-rails/builds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants