-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video reference scripts #1180
Video reference scripts #1180
Conversation
Gives even slightly better results than expected, with 57.336 top1 clip accuracy. But we count some clips twice in this evaluation
Codecov Report
@@ Coverage Diff @@
## master #1180 +/- ##
==========================================
+ Coverage 65.78% 65.78% +<.01%
==========================================
Files 79 79
Lines 5834 5849 +15
Branches 887 890 +3
==========================================
+ Hits 3838 3848 +10
- Misses 1726 1730 +4
- Partials 270 271 +1
Continue to review full report at Codecov.
|
@@ -23,4 +24,7 @@ def __getitem__(self, idx): | |||
video, audio, info, video_idx = self.video_clips.get_clip(idx) | |||
label = self.samples[video_idx][1] | |||
|
|||
if self.transform is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, I think we should keep this as transform, and maybe have a wrapper for audio transforms, or wait until batched transforms for both audio and video are ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, and we were able to match the performance of the baseline models and beat prototype implementation
For reference and to complement @bjuncek note, training on Kinetics400 for
which matches the expected results from https://github.com/facebookresearch/VMZ/blob/master/tutorials/models.md |
This PR adds training and evaluation scripts for video models.
It also adds a few extra helper functions, which should ideally be integrated in PyTorch / Torchvision instead of being part of the reference scripts. For now they are added here to avoid having to worry about backwards-compatibility.
Some parts of the main training script needs cleanup, specially the part handling caching of the dataset.
I'm sending this PR now for early feedback.
Note that the first commit is only copying as is the training scripts from image classification, and do not need to be reviewed.
cc @bjuncek