Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video reference scripts #1180

Merged
merged 9 commits into from
Jul 31, 2019
Merged

Conversation

fmassa
Copy link
Member

@fmassa fmassa commented Jul 29, 2019

This PR adds training and evaluation scripts for video models.

It also adds a few extra helper functions, which should ideally be integrated in PyTorch / Torchvision instead of being part of the reference scripts. For now they are added here to avoid having to worry about backwards-compatibility.

Some parts of the main training script needs cleanup, specially the part handling caching of the dataset.
I'm sending this PR now for early feedback.

Note that the first commit is only copying as is the training scripts from image classification, and do not need to be reviewed.

cc @bjuncek

fmassa added 5 commits July 31, 2019 02:11
Gives even slightly better results than expected, with 57.336 top1 clip accuracy. But we count some clips twice in this evaluation
@codecov-io
Copy link

codecov-io commented Jul 31, 2019

Codecov Report

Merging #1180 into master will increase coverage by <.01%.
The diff coverage is 86.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1180      +/-   ##
==========================================
+ Coverage   65.78%   65.78%   +<.01%     
==========================================
  Files          79       79              
  Lines        5834     5849      +15     
  Branches      887      890       +3     
==========================================
+ Hits         3838     3848      +10     
- Misses       1726     1730       +4     
- Partials      270      271       +1
Impacted Files Coverage Δ
torchvision/io/video.py 72% <100%> (+0.57%) ⬆️
torchvision/datasets/video_utils.py 85.18% <100%> (+1.57%) ⬆️
torchvision/datasets/kinetics.py 34.78% <25%> (-5.22%) ⬇️
torchvision/transforms/transforms.py 80.35% <0%> (-0.59%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2287c8f...2c9624f. Read the comment docs.

@@ -23,4 +24,7 @@ def __getitem__(self, idx):
video, audio, info, video_idx = self.video_clips.get_clip(idx)
label = self.samples[video_idx][1]

if self.transform is not None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjuncek @soumith Looking for your opinions on how we should call the video transforms.

Indeed, we might have both video_transform and audio_transform, given that the dataset returns both data.

Would you prefer to stick with transform for meaning video_transform, or choose a different name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, I think we should keep this as transform, and maybe have a wrapper for audio transforms, or wait until batched transforms for both audio and video are ready

Copy link
Contributor

@bjuncek bjuncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and we were able to match the performance of the baseline models and beat prototype implementation

@fmassa
Copy link
Member Author

fmassa commented Jul 31, 2019

For reference and to complement @bjuncek note, training on Kinetics400 for r2plus1d_18 gives the following results for clip-len=16:

Clip Acc@1 57.351 Clip Acc@5 78.523

which matches the expected results from https://github.com/facebookresearch/VMZ/blob/master/tutorials/models.md

@fmassa fmassa merged commit 5c0b7f3 into pytorch:master Jul 31, 2019
@fmassa fmassa deleted the video-reference-scripts branch July 31, 2019 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants