Update audio preprocessing tutorial #797

CamiWilliams · 2019-12-20T20:36:17Z

Updating the audio tutorial for the Jan 7 release

support for batching in transforms. (Batching for transforms audio#337, Complex STFT transform from spectrogram audio#327)
torchaudio.functional.gain and torchaudio.functional.dither. (Add functionals gain, dither, scale_to_interval audio#319, Remove scale to interval from #319 audio#360)
data augmentations in torchaudio.augmentation: TimeStretch, FrequencyMasking, TimeMasking. (torchaudio-contrib: some augmentations audio#285, follow convention for dimension. audio#333, Move augmentations in transforms audio#348)
- Going to add example in next iteration of tutorial
compute deltas: torchaudio.functional.compute_deltas. (Compute deltas audio#268, make all functional jit script audio#326)
torchaudio.functional.equalizer_biquad. (Add peaking equalizer filter audio#315, fix python 2 flake8 error with character encoding audio#340)
torchaudio.functional.lowpass_biquad and torchaudio.functional.highpass_biquad. (biquad filter similar to SoX audio#275)
torchaudio.kaldi.mfcc (https://pytorch.org/audio/functional.html#mfcc ). (Kaldi MFCC audio#228)

In a following update of the tutorial, we will want:

We introduce a backend switch mechanism to replace sox by something else. We currently support soundfile along with sox. The backends are loaded lazily when needed. (Backend switch audio#355)
We now have a unified datasets interface with lazy loading of files to memory, new download and extract functions, and new datasets: LibriSpeech and CommonVoice. (new dataset format with librispeech and commonvoice audio#303, dataset path. audio#316, Data points remain tuples audio#330)

Lower priority items:

We make jit compilation optional for function and use nn.Module where possible. (Make jit compilation optional for function and use nn.Module audio#314, make all functional jit script audio#326, decorating equalizer_biquad with jit audio#342)

netlify · 2019-12-20T20:44:28Z

Deploy preview for pytorch-tutorials-preview ready!

Built with commit e08a480

https://deploy-preview-797--pytorch-tutorials-preview.netlify.com

vincentqb · 2019-12-20T20:51:50Z

Here's the existing tutorial, the source code, added in this PR derived from this notebook using this script.

vincentqb · 2019-12-20T20:55:13Z

Here are the notes I wrote from our chats :)

When the binaries are released on Jan 7, we want to showcase in the tutorial:

support for batching in transforms. (Batching for transforms audio#337, Complex STFT transform from spectrogram audio#327)
torchaudio.functional.gain and torchaudio.functional.dither. (Add functionals gain, dither, scale_to_interval audio#319, Remove scale to interval from #319 audio#360)
data augmentations in torchaudio.augmentation: TimeStretch, FrequencyMasking, TimeMasking. (torchaudio-contrib: some augmentations audio#285, follow convention for dimension. audio#333, Move augmentations in transforms audio#348)
compute deltas: torchaudio.functional.compute_deltas. (Compute deltas audio#268, make all functional jit script audio#326)
torchaudio.functional.equalizer_biquad. (Add peaking equalizer filter audio#315, fix python 2 flake8 error with character encoding audio#340)
torchaudio.functional.lowpass_biquad and torchaudio.functional.highpass_biquad. (biquad filter similar to SoX audio#275)
torchaudio.functional.mfcc (https://pytorch.org/audio/functional.html#mfcc ). (Kaldi MFCC audio#228)

In a following update of the tutorial, we will want:

We introduce a backend switch mechanism to replace sox by something else. We currently support soundfile along with sox. The backends are loaded lazily when needed. (Backend switch audio#355)
We now have a unified datasets interface with lazy loading of files to memory, new download and extract functions, and new datasets: LibriSpeech and CommonVoice. (new dataset format with librispeech and commonvoice audio#303, dataset path. audio#316, Data points remain tuples audio#330)

Lower priority items:

We make jit compilation optional for function and use nn.Module where possible. (Make jit compilation optional for function and use nn.Module audio#314, make all functional jit script audio#326, decorating equalizer_biquad with jit audio#342)

vincentqb · 2019-12-27T15:32:56Z

FYI #600 is an open issue about generated plots and white space around them

CamiWilliams · 2020-01-02T22:55:11Z

I need some help with the example for augmentations. I tried to replicate what the PR example had in my collab, but I am missing some pieces:

num_freqs, hop_length = 400, 512
model = torch.nn.Sequential(torch.stft(waveform, frame_length=hop_length, fft_size=(num_freqs - 1) * 2),
                           torchaudio.transforms.TimeStretch(freq=num_freqs, hop_length=hop_length, fixed_rate=1.3),
                           torchaudio.transforms.ComplexNorm(power=2.0),
                           torchaudio.transforms.FrequencyMasking(freq_mask_param=60, iid_masks=False),
                           torchaudio.transforms.TimeMasking(time_mask_param=30, iid_masks=False),
                           torchaudio.transforms.AmplitudeToDB() )
out = model(waveform)
plot_heatmap(out)

I am constructing torch.stft incorrectly, how would i use this on my current waveform?
Where can I find implementations of model and plot_heatmap?
Are the augmentations going to live in torchaudio.augmentations or torchaudio.transforms? Found code for both.

vincentqb

Thanks for working on this!

beginner_source/audio_preprocessing_tutorial.py

vincentqb · 2020-01-03T16:40:02Z

I need some help with the example for augmentations. I tried to replicate what the PR example had in my collab, but I am missing some pieces:

num_freqs, hop_length = 400, 512
model = torch.nn.Sequential(torch.stft(waveform, frame_length=hop_length, fft_size=(num_freqs - 1) * 2),
                           torchaudio.transforms.TimeStretch(freq=num_freqs, hop_length=hop_length, fixed_rate=1.3),
                           torchaudio.transforms.ComplexNorm(power=2.0),
                           torchaudio.transforms.FrequencyMasking(freq_mask_param=60, iid_masks=False),
                           torchaudio.transforms.TimeMasking(time_mask_param=30, iid_masks=False),
                           torchaudio.transforms.AmplitudeToDB() )
out = model(waveform)
plot_heatmap(out)

I am constructing torch.stft incorrectly, how would i use this on my current waveform?

torchaudio.transforms.spectrogram with power=None does that.

Where can I find implementations of model and plot_heatmap?

You are constructing model yourself in the example when you do:

model = torch.nn.Sequential(...)

Is that what you mean?

plot_heatmap is a custom function made by some user. How about looking into matplotlib with imshow or other ones we already use in the tutorial?

Are the augmentations going to live in torchaudio.augmentations or torchaudio.transforms? Found code for both.

They are in torchaudio.transforms, see code for v0.4.0.

vincentqb

Thanks for the update!

The release was pushed to Jan 14, so we have a few more days :)

beginner_source/audio_preprocessing_tutorial.py

vincentqb

I gave some feedback on dataset section, but otherwise LGTM!

We need to rebase/merge master, since the branch is out-of-date.
We'll take a look at the preview once it is regenerated
We need to test locally, since CI will fail until torchaudio is released.
We can merge this PR once torchaudio is released :)

beginner_source/audio_preprocessing_tutorial.py

brianjo · 2020-01-13T18:10:09Z

@CamiWilliams @vincentqb Could you all look at the build logs for this and flag what the build issue would be: https://app.circleci.com/jobs/github/pytorch/tutorials/21752 Assuming it might need 1.4 or an updated torchaudio version? Thanks! FYI @jlin27

vincentqb · 2020-01-13T18:26:47Z

@CamiWilliams @vincentqb Could you all look at the build logs for this and flag what the build issue would be: https://app.circleci.com/jobs/github/pytorch/tutorials/21752 Assuming it might need 1.4 or an updated torchaudio version? Thanks! FYI @jlin27

Jan 13 16:54:55 AttributeError: module 'torchaudio.functional' has no attribute 'compute_deltas'

This requires torchaudio 0.4.0 that isn't released yet.

brianjo · 2020-01-13T20:32:21Z

Great. Thanks for the heads up! Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Vincent QB <notifications@github.com> Sent: Monday, January 13, 2020 10:26:48 AM To: pytorch/tutorials <tutorials@noreply.github.com> Cc: Brian Johnson <brianjo@fb.com>; Comment <comment@noreply.github.com> Subject: Re: [pytorch/tutorials] Update audio preprocessing tutorial (#797) @CamiWilliams<https://github.com/CamiWilliams> @vincentqb<https://github.com/vincentqb> Could you all look at the build logs for this and flag what the build issue would be: https://app.circleci.com/jobs/github/pytorch/tutorials/21752<https://urldefense.proofpoint.com/v2/url?u=https-3A__app.circleci.com_jobs_github_pytorch_tutorials_21752&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=BN8I0Mcl9uorfjFjY23GJA&m=tEAZnA1gHO9n-xq_tDd0nJv_rVfFt-Gc-7XJ3yDJAM0&s=tBvzd-09TUI_LzY9JCL5dsKHXkp5K1bTpgZfN4bAoxE&e=> Assuming it might need 1.4 or an updated torchaudio version? Thanks! FYI @jlin27<https://github.com/jlin27> Jan 13 16:54:55 AttributeError: module 'torchaudio.functional' has no attribute 'compute_deltas' This requires torchaudio 0.4.0 that isn't released yet. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#797?email_source=notifications&email_token=AALOH5PAZAA6J3IJ757RM7DQ5SW6RA5CNFSM4J6CM6FKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIZYSSA#issuecomment-573802824>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AALOH5JKPHTRFHNPETLKJOTQ5SW6RANCNFSM4J6CM6FA>.

beginner_source/audio_preprocessing_tutorial.py

vincentqb

Thanks, LGTM! As mentioned before, once torchaudio 0.4.0 is released, we should rebase, rerun the build, and merge.

vincentqb · 2020-01-16T01:24:35Z

Binaries for torchaudio are now available :)

Reference: pytorch/audio#410

vincentqb · 2020-01-16T01:44:09Z

The error below is due to sox, see pytorch/audio#171.

Jan 16 00:31:36 Unexpected failing examples:
Jan 16 00:31:36 /var/lib/jenkins/workspace/beginner_source/audio_preprocessing_tutorial.py failed leaving traceback:
Jan 16 00:31:36 Traceback (most recent call last):
Jan 16 00:31:36   File "/opt/conda/lib/python3.6/site-packages/sphinx_gallery/gen_rst.py", line 394, in _memory_usage
Jan 16 00:31:36     out = func()
Jan 16 00:31:36   File "/opt/conda/lib/python3.6/site-packages/sphinx_gallery/gen_rst.py", line 382, in __call__
Jan 16 00:31:36     exec(self.code, self.globals)
Jan 16 00:31:36   File "/var/lib/jenkins/workspace/beginner_source/audio_preprocessing_tutorial.py", line 21, in <module>
Jan 16 00:31:36     import torchaudio
Jan 16 00:31:36   File "/opt/conda/lib/python3.6/site-packages/torchaudio/__init__.py", line 5, in <module>
Jan 16 00:31:36     import _torch_sox
Jan 16 00:31:36 ImportError: /opt/conda/lib/python3.6/site-packages/_torch_sox.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_29E

@CamiWilliams -- Can you trigger a rerun of the tests to see if the error persists? I can't reproduce offline. Let's wait until we know what happens before merging.

vincentqb · 2020-01-16T18:47:47Z

Yay! Checks are all green now. Ready to merge!

brianjo · 2020-01-16T22:26:29Z

Nice job @CamiWilliams. Output looks good. :)

Update audio preprocessing tutorial

vincentqb suggested changes Jan 3, 2020

View reviewed changes

CamiWilliams changed the title ~~[WIP] Update audio preprocessing tutorial~~ Update audio preprocessing tutorial Jan 7, 2020

vincentqb suggested changes Jan 8, 2020

View reviewed changes

vincentqb suggested changes Jan 9, 2020

View reviewed changes

beginner_source/audio_preprocessing_tutorial.py Outdated Show resolved Hide resolved

beginner_source/audio_preprocessing_tutorial.py Outdated Show resolved Hide resolved

beginner_source/audio_preprocessing_tutorial.py Outdated Show resolved Hide resolved

vincentqb reviewed Jan 13, 2020

View reviewed changes