In this work, we employ variational inference and stochastic process modeling to develop a framework called Motion Code.
The trained models for Motion Code is available in the folder saved_models
due to their small sizes. However, for reproducibility, you need to download the datasets as we cannot store them directly in the repos due to limited storage. In addition to data, the trained attention-based benchmarking models are available
through downloading. To download the datasets and attention-based benchmarking models, follow 3 steps:
-
Go to the download link: https://www.swisstransfer.com/d/b3ff7a9a-52fc-49b4-a25c-1e584707bd18 and download the zip file.
- Password: assets_for_motion_code
Note that it can take up to 10 minutes to download the file. Additionally, the link is expired every month, but this repos is continuously updated and you can always check this README.md for updated link(s)
-
Unzip the downloaded file. Inside the file is
motion_code
folder, which contains 2 sub-foldersdata
andTSLibrary
:- The
data
folder contains experiment data (basic noisy datasets, audio and parkinson data). Copy this folder to the repo root (ifdata
folder already exists in the repo, then copy its content over). TSLibrary
contain 3 folders, and you need to add these 3 folders to theTSLibrary
folder of the repo. These 3 folders include:dataset
: contains .ts version ofdata
foldercheckpoints
: contains trained attention-based modelsresults
: contains classification results of attention-based models
- The
-
Please make sure that you have
data
,dataset
,checkpoints
, andresults
downloaded and stored in the correct location as instructed above. Once this is done, you're ready to run tutorial notebooks and other notebooks in the repo.
Please look at the tutorial notebook tutorial_notebook.ipynb
to learn how to use Motion Code.
To initialize the model, use the code:
from motion_code import MotionCode
model = MotionCode(m=10, Q=1, latent_dim=2, sigma_y=0.1)
For training the Motion Code, use:
model.fit(X_train, Y_train, labels_train, model_path)
Motion Code performs both classification and forecasting.
- For the classification task, use:
model.classify_predict(X_test, Y_test)
- For the forecasting task, use:
mean, covar = model.forecast_predict(test_time_horizon, label=0)
All package prerequisites are given in requirements.txt
. You can install by
pip install -r requirements.txt
To learn how to generate interpretable features, please see Pronunciation_Audio.ipynb
for further tutorial.
This notebook gets the audio data, trains a Motion Code model on the data, and plots the interpretable features
obtained from Motion Code's most informative timestamps.
Here are some examples of the most informative timestamps features extracted from Motion Code that captures different underlying dynamics:
Humidity sensor (MoteStrain) | Temperature sensor (MoteStrain) |
---|---|
![]() |
![]() |
Winter power demand (ItalyPowerDemand) | Spring power demand (ItalyPowerDemand) |
---|---|
![]() |
![]() |
Word "absortivity" (Pronunciation audio) | Word "anything" (Pronunciation audio) |
---|---|
![]() |
![]() |
The main benchmark file is benchmarks.py
.
For benchmarking models, we consider two types:
- Non attention-based and our model
- Attention-based model such as Informer or Autoformer
You can get all classification benchmarks in a highlighted manner by running the notebook collect_all_benchmarks.ipynb
. Once the run is completed, the output out/all_classification_benchmark_results.html
will contain all classification benchmark results. To further doing more customize steps, you can follow additional instructions below:
- Classification benchmarking on basic dataset with noise:
python benchmarks.py --dataset_type="basics" --load_existing_model=True --load_existing_data=True --output_path="out/classify_basics.csv"
- Forecasting benchmarking on basic dataset with noise:
python benchmarks.py --dataset_type="basics" --forecast=True --load_existing_model=True --load_existing_data=True --output_path="out/forecast_basics.csv"
- Classification and forecasting benchmarking on (Pronunciation) Audio dataset:
python benchmarks.py --dataset_type="pronunciation" --load_existing_model=True --load_existing_data=True --output_path="out/classify_pronunciation.csv" python benchmarks.py --dataset_type="pronunciation" --forecast=True --load_existing_model=True --load_existing_data=True --output_path="out/forecast_pronunciation.csv"
- Benchmarking on Parkinson data for either PD setting 1 or PD setting 2:
python benchmarks.py --dataset_type="parkinson_1" --load_existing_model=True --output_path="out/classify_parkinson_1.csv" python benchmarks.py --dataset_type="parkinson_2" --load_existing_model=True --output_path="out/classify_parkinson_2.csv"
We will use the Time Series Library (TSLibrary), stored in the TSLibrary
folder.
To rerun all training, execute the script:
bash TSLibrary/attention_benchmark.sh
For efficiency, it is recommended to use existing (already) trained models and run collect_all_benchmarks.ipynb
to get the benchmark results.
- DTW:
distance="dtw"
- TSF:
n_estimators=100
- BOSS-E:
max_ensemble_size=3
- Shapelet:
estimator=RotationForest(n_estimators=3)
,n_shapelet_samples=100
,max_shapelets=10
,batch_size=20
- SVC:
kernel=mean_gaussian_tskernel
- LSTM-FCN:
n_epochs=200
- Rocket:
num_kernels=500
- Hive-Cote 2:
time_limit_in_minutes=0.2
- Attention-based parameters: Refer to
TSLibrary/attention_benchmark.sh
- Exponential Smoothing:
trend="add"
,seasonal="additive"
,sp=12
- ARIMA:
order=(1, 1, 0)
,seasonal_order=(0, 1, 0, 12)
- State-space:
level="local linear trend"
,freq_seasonal=[{"period": 12, "harmonics": 10}]
- TBATS:
use_box_cox=False
,use_trend=False
,use_damped_trend=False
,sp=12
,use_arma_errors=False
,n_jobs=1
The main visualization file is visualize.py
.
To extract interpretable features from Motion Code, run:
python visualize.py --type="classify_motion_code" --dataset="PD setting 2"
Change the dataset argument as needed (e.g., Pronunciation Audio
, PD setting 1
, PD setting 2
).
- To visualize forecasting with mean and variance:
python visualize.py --type="forecast_mean_var" --dataset="ItalyPowerDemand"
- To visualize forecasting with informative timestamps:
python visualize.py --type="forecast_motion_code" --dataset="ItalyPowerDemand"
- Tutorial notebooks:
tutorial_notebook.ipynb
,Pronunciation_Audio.ipynb
data
folder: Contains three subfolders:basics
: Basic datasets with noise.audio
: Pronunciation Audio dataset.parkinson
: Parkinson sensor dataset.
saved_models
folder: Contains (already) trained Motion Code models for inference and benchmarking.- Python files:
- Data processing:
data_processing.py
,parkinson_data_processing.py
,utils.py
- Motion Code model:
motion_code.py
,motion_code_utils.py
,sparse_gp.py
- Benchmarking:
benchmark.py
(Non-attention),collect_all_benchmarks.ipynb
(All) - Visualization:
visualize.py
- Other notebooks: Under notebooks folder are
MotionCodeTSC_create.ipynb
to convert .npy to .ts data andPronunciation_Audio.ipynb
for extracting intepretable feature from varying-length audio data
- Data processing:
- Time Series Library
TSLibrary
folder : containsattention_benchmark.sh
for re-running all attention-based benchmarking models training.