Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_loading_tutorial.py iterators #995

Closed
patricknaughton01 opened this issue May 12, 2020 · 3 comments · Fixed by #2407
Closed

data_loading_tutorial.py iterators #995

patricknaughton01 opened this issue May 12, 2020 · 3 comments · Fixed by #2407
Assignees
Labels
data loading Issues relating to the data loading tutorial docathon-h1-2023 A label for the docathon in H1 2023 medium

Comments

@patricknaughton01
Copy link

I like this tutorial but I think it would be better if it included an example of how to define next() and iter() methods so that the dataset can be used with enumerate.

@holly1238 holly1238 added the data loading Issues relating to the data loading tutorial label Jul 27, 2021
@svekars svekars added medium docathon-h1-2023 A label for the docathon in H1 2023 labels May 31, 2023
@noqqaqq
Copy link
Contributor

noqqaqq commented Jun 1, 2023

/assigntome

@noqqaqq
Copy link
Contributor

noqqaqq commented Jun 2, 2023

Some thoughts looking at this issue:

  1. iter and next are not needed to use enumerate - it works out of the box with len and getitem. Tutorial can be then easily improved to use enumerate in relevant places without reimplementing dataset class.
  2. There is IterableDataset. I guess, implementing it would be most sufficient for this example.
  3. Using IterableDataset raise some issues:
  • random access is lost, which then requires looping or skipping iterator for this case (need to make sure to implement it in a way that minimize perf penalty)
  • IterableDataset does not allow for shuffling, which is needed when passing dataset to Dataloader. There are ways to implement this like described here or here or here, but I don't know yet how to do it in the simplest correct way for the sake of tutorial.

To sum up, I'll follow with pushing patch for 1. and try to copy and modify dataset tutorial to cover iterable case (having both in single file in my opinion would obfuscate the examples and make them hard to track / understand).

If anyone has comments on this or hints on shuffling please let me know.

@NicolasHug
Copy link
Member

I agree with @noqqaqq's thoughts that there's no need for next() and iter() to be present for enunerate() to work. The original point of the issue may have been to include an example for iterable datasets whoc have a __next__() method instead of __getitem__(), but like @noqqaqq also correctly pointed out, there are other resources out there to do that and that's not the original point of this tutorial.

For the simple purpose of exposing how to use enumerate(), #2407 LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data loading Issues relating to the data loading tutorial docathon-h1-2023 A label for the docathon in H1 2023 medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants