Skip to content

Commit 3712369

Browse files
benjamin-workottonemo
authored andcommittedDec 13, 2018
Feature: quick CLIs (skorch-dev#390)
* Helper functions to for CLIs with almost boilerplate. Add a helper function parse_args that makes it very simple to build custom CLIs. Add an example for the usage of this and extend docs. * Extend and adjust README and docs. * Add CLI implementation and unit tests. * Update dev requirements: pytest >= 3.4 * Add fire library to dev requirements. * Remove fire from dev requirements, install in travis. fire is not on the conda channels, so install would fail. Also, modify cli tests to be skipped if fire is not installed. * Correct typos. * Add the option to have custom defaults. E.g., if you would like to use batch_size=256 as a default instead of 128, you can now pass a dict `{'batch_size': 256}` to `parse_args`. This will not only update your model to use those defaults but also change the help to show your custom defaults. To achieve the latter effect, it was necessary to parse the sklearn docstrings for default values and replace them by the new default. This turned out to be more difficult than expected because the docstring defaults are not always written in the same fashion. I tried to catch some variants that I found but there are certainly more variants out there. It should, however, work fine with the way we write docstrings in skorch. * Fix typo in docs/user/helper.rst Co-Authored-By: benjamin-work <benjamin.bossan@ottogroup.com> * Update docstring, remove unnecessary try..except. * Simplify function that matches span for docstring match.
1 parent df1099a commit 3712369

File tree

9 files changed

+1174
-2
lines changed

9 files changed

+1174
-2
lines changed
 

‎.travis.yml

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ install:
3030
- source activate skorch-env
3131
- cat requirements.txt requirements-dev.txt > reqs.txt
3232
- conda install --file=reqs.txt
33+
- pip install fire
3334
- pip install .
3435
- conda install -c pytorch pytorch-cpu==${PYTORCH_VERSION}
3536
script:

‎CHANGES.md

+3
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2222
a re-initialization of the optimizer (#369)
2323
- Support for scipy sparse CSR matrices as input (as, e.g., returned by sklearn's
2424
`CountVectorizer`); note that they are cast to dense matrices during batching
25+
- Helper functions to build command line interfaces with almost no
26+
boilerplate, [example][1811191713] that shows usage
2527

2628
[1810251445]: https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/Basic_Usage.ipynb
2729
[1810261633]: https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/Advanced_Usage.ipynb
2830
[1811011230]: https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/MNIST.ipynb
31+
[1811191713]: https://github.com/dnouri/skorch/tree/master/examples/cli
2932

3033
### Changed
3134

‎docs/user/helper.rst

+159
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ Helper
55
This module provides helper functions and classes for the user. They
66
make working with skorch easier but are not used by skorch itself.
77

8+
89
SliceDict
910
---------
1011

@@ -16,3 +17,161 @@ length of the arrays and not the number of keys, and you get a
1617
``dict``, you would normally not be able to use sklearn
1718
:class:`~sklearn.model_selection.GridSearchCV` and similar things;
1819
with :class:`.SliceDict`, this works.
20+
21+
22+
Command line interface helpers
23+
------------------------------
24+
25+
Often you want to wrap up your experiments by writing a small script
26+
that allows others to reproduce your work. With the help of skorch and
27+
the fire_ library, it becomes very easy to write command line
28+
interfaces without boilerplate. All arguments pertaining to skorch or
29+
its PyTorch module are immediately available as command line
30+
arguments, without the need to write a custom parser. If docstrings in
31+
the numpydoc_ specification are available, there is also an
32+
comprehensive help for the user. Overall, this allows you to make your
33+
work reproducible without the usual hassle.
34+
35+
There is an example_ in the skorch repository that shows how to use
36+
the CLI tools. Below is a snippet that shows the output created by the
37+
help function without writing a single line of argument parsing:
38+
39+
.. code:: bash
40+
41+
$ python examples/cli/train.py pipeline --help
42+
43+
<SelectKBest> options:
44+
--select__score_func : callable
45+
Function taking two arrays X and y, and returning a pair of arrays
46+
(scores, pvalues) or a single array with scores.
47+
Default is f_classif (see below "See also"). The default function only
48+
works with classification tasks.
49+
--select__k : int or "all", optional, default=10
50+
Number of top features to select.
51+
The "all" option bypasses selection, for use in a parameter search.
52+
53+
...
54+
55+
<NeuralNetClassifier> options:
56+
--net__module : torch module (class or instance)
57+
A PyTorch :class:`~torch.nn.Module`. In general, the
58+
uninstantiated class should be passed, although instantiated
59+
modules will also work.
60+
--net__criterion : torch criterion (class, default=torch.nn.NLLLoss)
61+
Negative log likelihood loss. Note that the module should return
62+
probabilities, the log is applied during ``get_loss``.
63+
--net__optimizer : torch optim (class, default=torch.optim.SGD)
64+
The uninitialized optimizer (update rule) used to optimize the
65+
module
66+
--net__lr : float (default=0.01)
67+
Learning rate passed to the optimizer. You may use ``lr`` instead
68+
of using ``optimizer__lr``, which would result in the same outcome.
69+
--net__max_epochs : int (default=10)
70+
The number of epochs to train for each ``fit`` call. Note that you
71+
may keyboard-interrupt training at any time.
72+
--net__batch_size : int (default=128)
73+
...
74+
--net__verbose : int (default=1)
75+
Control the verbosity level.
76+
--net__device : str, torch.device (default='cpu')
77+
The compute device to be used. If set to 'cuda', data in torch
78+
tensors will be pushed to cuda tensors before being sent to the
79+
module.
80+
81+
<MLPClassifier> options:
82+
--net__module__hidden_units : int (default=10)
83+
Number of units in hidden layers.
84+
--net__module__num_hidden : int (default=1)
85+
Number of hidden layers.
86+
--net__module__nonlin : torch.nn.Module instance (default=torch.nn.ReLU())
87+
Non-linearity to apply after hidden layers.
88+
--net__module__dropout : float (default=0)
89+
Dropout rate. Dropout is applied between layers.
90+
91+
Installation
92+
^^^^^^^^^^^^
93+
94+
To use this functionality, you need some further libraries that are not
95+
part of skorch, namely fire_ and numpydoc_. You can install them
96+
thusly:
97+
98+
99+
.. code:: bash
100+
101+
pip install fire numpydoc
102+
103+
Usage
104+
^^^^^
105+
106+
When you write your own script, only the following bits need to be
107+
added:
108+
109+
.. code:: python
110+
111+
import fire
112+
from skorch.helper import parse_args
113+
114+
# your model definition and data fetching code below
115+
...
116+
117+
def main(**kwargs):
118+
X, y = get_data()
119+
my_model = get_model()
120+
121+
# important: wrap the model with the parsed arguments
122+
parsed = parse_args(kwargs)
123+
my_model = parsed(my_model)
124+
125+
my_model.fit(X, y)
126+
127+
128+
if __name__ == '__main__':
129+
fire.Fire(main)
130+
131+
132+
This even works if your neural net is part of an sklearn pipeline, in
133+
which case the help extends to all other estimators of your pipeline.
134+
135+
In case you would like to change some defaults for the net (e.g. using
136+
a ``batch_size`` of 256 instead of 128), this is also possible. You
137+
should have a dictionary containing your new defaults and pass it as
138+
an additional argument to ``parse_args``:
139+
140+
.. code:: python
141+
142+
my_defaults = {'batch_size': 128, 'module__hidden_units': 30}
143+
144+
def main(**kwargs):
145+
...
146+
parsed = parse_args(kwargs, defaults=my_defaults)
147+
my_model = parsed(my_model)
148+
149+
150+
This will update the displayed help to your new defaults, as well as
151+
set the parameters on the net or pipeline for you. However, the
152+
arguments passed via the commandline have precedence. Thus, if you
153+
additionally pass ``--batch_size 512`` to the script, batch size will
154+
be 512.
155+
156+
Restrictions
157+
^^^^^^^^^^^^
158+
159+
Almost all arguments should work out of the box. Therefore, you get
160+
command line arguments for the number of epochs, learning rate, batch
161+
size, etc. for free. Moreover, you can access the module parameters
162+
with the double-underscore notation as usual with skorch
163+
(e.g. ``--module__num_units 100``). This should cover almost all
164+
common cases.
165+
166+
Parsing command line arguments that are non-primitive Python objects
167+
is more difficult, though. skorch's custom parsing should support
168+
normal Python types and simple custom objects, e.g. this works:
169+
``--module__nonlin 'torch.nn.RReLU(0.1, upper=0.4)'``. More complex
170+
parsing might not work. E.g., it is currently not possible to add new
171+
callbacks through the command line (but you can modify existing ones
172+
as usual).
173+
174+
175+
.. _fire: https://github.com/google/python-fire
176+
.. _numpydoc: https://github.com/numpy/numpydoc
177+
.. _example: https://github.com/dnouri/skorch/tree/master/examples/cli

‎examples/cli/README.md

+144
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# skorch helpers for command line interfaces (CLIs)
2+
3+
Often you want to wrap up your experiments by writing a small script
4+
that allows others to reproduce your work. With the help of skorch and
5+
the fire library, it becomes very easy to write command line
6+
interfaces without boilerplate. All arguments pertaining to skorch or
7+
its PyTorch module are immediately available as command line
8+
arguments, without the need to write a custom parser. If docstrings in
9+
the numpydoc specification are available, there is also an
10+
comprehensive help for the user. Overall, this allows you to make your
11+
work reproducible without the usual hassle.
12+
13+
This example is a showcase of how easy CLIs become with skorch.
14+
15+
## Installation
16+
17+
To use this functionaliy, you need some further libraries that are not
18+
part of skorch, namely fire and numpydoc. You can install them thusly:
19+
20+
```bash
21+
pip install fire numpydoc
22+
```
23+
24+
## Usage
25+
26+
The `train.py` file contains an example of how to write your own CLI
27+
with the help of skorch. As you can see, this file almost exclusively
28+
consists of the proper logic, there is no argument parsing
29+
involved.
30+
31+
When you write your own script, only the following bits need to be
32+
added:
33+
34+
```python
35+
36+
import fire
37+
from skorch.helper import parse_args
38+
39+
# your model definition and data fetching code below
40+
...
41+
42+
def main(**kwargs):
43+
X, y = get_data()
44+
my_model = get_model()
45+
46+
# important: wrap the model with the parsed arguments
47+
parsed = parse_args(kwargs)
48+
my_model = parsed(my_model)
49+
50+
my_model.fit(X, y)
51+
52+
53+
if __name__ == '__main__':
54+
fire.Fire(main)
55+
56+
```
57+
58+
This even works if your neural net is part of an sklearn pipeline, in
59+
which case the help extends to all other estimators of your pipeline.
60+
61+
In case you would like to change some defaults for the net (e.g. using
62+
a `batch_size` of 256 instead of 128), this is also possible. You
63+
should have a dictionary containing your new defaults and pass it as
64+
an additional argument to `parse_args`:
65+
66+
```python
67+
68+
my_defaults = {'batch_size': 128, 'module__hidden_units': 30}
69+
70+
def main(**kwargs):
71+
...
72+
parsed = parse_args(kwargs, defaults=my_defaults)
73+
my_model = parsed(my_model)
74+
75+
```
76+
77+
This will update the displayed help to your new defaults, as well as
78+
set the parameters on the net or pipeline for you. However, the
79+
arguments passed via the commandline have precedence. Thus, if you
80+
additionally pass ``--batch_size 512`` to the script, batch size will
81+
be 512.
82+
83+
For more information on how to use fire, follow [this
84+
link](https://github.com/google/python-fire).
85+
86+
## Restrictions
87+
88+
Almost all arguments should work out of the box. Therefore, you get
89+
command line arguments for the number of epochs, learning rate, batch
90+
size, etc. for free. Moreover, you can access the module parameters
91+
with the double-underscore notation as usual with skorch
92+
(e.g. `--module__num_units 100`). This should cover almost all common
93+
cases.
94+
95+
Parsing command line arguments that are non-primitive Python objects
96+
is more difficult, though. skorch's custom parsing should support
97+
normal Python types and simple custom objects, e.g. this works:
98+
`--module__nonlin 'torch.nn.RReLU(0.1, upper=0.4)'`. More complex
99+
parsing might not work. E.g., it is currently not possible to add new
100+
callbacks through the command line (but you can modify existing ones
101+
as usual).
102+
103+
## Running the script
104+
105+
### Getting Help
106+
107+
In this example, there are two variants, only the net ("net") and the
108+
net within an sklearn pipeline ("pipeline"). To get general help for
109+
each, run:
110+
111+
```bash
112+
python train.py net -- --help
113+
python train.py pipeline -- --help
114+
```
115+
116+
To get help for model-specific parameters, run:
117+
118+
```bash
119+
python train.py net --help
120+
python train.py pipeline --help
121+
```
122+
123+
### Training a Model
124+
125+
Run
126+
127+
```bash
128+
python train.py net # only the net
129+
python train.py pipeline # net with pipeline
130+
```
131+
132+
with the defaults.
133+
134+
Example with just the net and some non-defaults:
135+
136+
```bash
137+
python train.py net --n_samples 1000 --output_file 'model.pkl' --lr 0.1 --max_epochs 5 --device 'cuda' --module__hidden_units 50 --module__nonlin 'torch.nn.RReLU(0.1, upper=0.4)' --callbacks__valid_acc__on_train --callbacks__valid_acc__name train_acc
138+
```
139+
140+
Example with an sklearn pipeline:
141+
142+
```bash
143+
python train.py pipeline --n_samples 1000 --net__lr 0.1 --net__module__nonlin 'torch.nn.LeakyReLU()' --scale__minmax__feature_range '(-2, 2)' --scale__normalize__norm l1
144+
```

‎examples/cli/train.py

+207
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
"""Simple training script for a MLP classifier.
2+
3+
See accompanying README.md for more details.
4+
5+
"""
6+
7+
import pickle
8+
9+
import fire
10+
import numpy as np
11+
from sklearn.datasets import make_classification
12+
from sklearn.feature_selection import SelectKBest
13+
from sklearn.pipeline import FeatureUnion
14+
from sklearn.pipeline import Pipeline
15+
from sklearn.preprocessing import MinMaxScaler
16+
from sklearn.preprocessing import Normalizer
17+
from skorch import NeuralNetClassifier
18+
import torch
19+
from torch import nn
20+
21+
from skorch.helper import parse_args
22+
23+
24+
np.random.seed(0)
25+
torch.manual_seed(0)
26+
torch.cuda.manual_seed(0)
27+
28+
29+
# number of input features
30+
N_FEATURES = 20
31+
32+
# number of classes
33+
N_CLASSES = 2
34+
35+
# custom defaults for net
36+
DEFAULTS_NET = {
37+
'batch_size': 256,
38+
'module__hidden_units': 30,
39+
}
40+
41+
# custom defaults for pipeline
42+
DEFAULTS_PIPE = {
43+
'scale__minmax__feature_range': (-1, 1),
44+
'net__batch_size': 256,
45+
'net__module__hidden_units': 30,
46+
}
47+
48+
49+
class MLPClassifier(nn.Module):
50+
"""A simple multi-layer perceptron module.
51+
52+
This can be adapted for usage in different contexts, e.g. binary
53+
and multi-class classification, regression, etc.
54+
55+
Note: This docstring is used to create the help for the CLI.
56+
57+
Parameters
58+
----------
59+
hidden_units : int (default=10)
60+
Number of units in hidden layers.
61+
62+
num_hidden : int (default=1)
63+
Number of hidden layers.
64+
65+
nonlin : torch.nn.Module instance (default=torch.nn.ReLU())
66+
Non-linearity to apply after hidden layers.
67+
68+
dropout : float (default=0)
69+
Dropout rate. Dropout is applied between layers.
70+
71+
"""
72+
def __init__(
73+
self,
74+
hidden_units=10,
75+
num_hidden=1,
76+
nonlin=nn.ReLU(),
77+
dropout=0,
78+
):
79+
super().__init__()
80+
self.hidden_units = hidden_units
81+
self.num_hidden = num_hidden
82+
self.nonlin = nonlin
83+
self.dropout = dropout
84+
85+
self.reset_params()
86+
87+
def reset_params(self):
88+
"""(Re)set all parameters."""
89+
units = [N_FEATURES]
90+
units += [self.hidden_units] * self.num_hidden
91+
units += [N_CLASSES]
92+
93+
sequence = []
94+
for u0, u1 in zip(units, units[1:]):
95+
sequence.append(nn.Linear(u0, u1))
96+
sequence.append(self.nonlin)
97+
sequence.append(nn.Dropout(self.dropout))
98+
99+
sequence = sequence[:-2]
100+
self.sequential = nn.Sequential(*sequence)
101+
102+
def forward(self, X):
103+
return nn.Softmax(dim=-1)(self.sequential(X))
104+
105+
106+
def get_data(n_samples=100):
107+
"""Get synthetic classification data with n_samples samples."""
108+
X, y = make_classification(
109+
n_samples=n_samples,
110+
n_features=N_FEATURES,
111+
n_classes=N_CLASSES,
112+
random_state=0,
113+
)
114+
X = X.astype(np.float32)
115+
return X, y
116+
117+
118+
def get_model(with_pipeline=False):
119+
"""Get a multi-layer perceptron model.
120+
121+
Optionally, put it in a pipeline that scales the data.
122+
123+
"""
124+
model = NeuralNetClassifier(MLPClassifier)
125+
if with_pipeline:
126+
model = Pipeline([
127+
('scale', FeatureUnion([
128+
('minmax', MinMaxScaler()),
129+
('normalize', Normalizer()),
130+
])),
131+
('select', SelectKBest(k=N_FEATURES)), # keep input size constant
132+
('net', model),
133+
])
134+
return model
135+
136+
137+
def save_model(model, output_file):
138+
"""Save model to output_file, if given"""
139+
if not output_file:
140+
return
141+
142+
with open(output_file, 'wb') as f:
143+
pickle.dump(model, f)
144+
print("Saved model to file '{}'.".format(output_file))
145+
146+
147+
def net(n_samples=100, output_file=None, **kwargs):
148+
"""Train an MLP classifier on synthetic data.
149+
150+
n_samples : int (default=100)
151+
Number of training samples
152+
153+
output_file : str (default=None)
154+
If not None, file name used to save the model.
155+
156+
kwargs : dict
157+
Additional model parameters.
158+
159+
"""
160+
161+
model = get_model(with_pipeline=False)
162+
# important: wrap the model with the parsed arguments
163+
parsed = parse_args(kwargs, defaults=DEFAULTS_NET)
164+
model = parsed(model)
165+
166+
X, y = get_data(n_samples=n_samples)
167+
print("Training MLP classifier")
168+
model.fit(X, y)
169+
170+
save_model(model, output_file)
171+
172+
173+
def pipeline(n_samples=100, output_file=None, **kwargs):
174+
"""Train an MLP classifier in a pipeline on synthetic data.
175+
176+
The pipeline scales the input data before passing it to the net.
177+
178+
Note: This docstring is used to create the help for the CLI.
179+
180+
Parameters
181+
----------
182+
n_samples : int (default=100)
183+
Number of training samples
184+
185+
output_file : str (default=None)
186+
If not None, file name used to save the model.
187+
188+
kwargs : dict
189+
Additional model parameters.
190+
191+
"""
192+
193+
model = get_model(with_pipeline=True)
194+
# important: wrap the model with the parsed arguments
195+
parsed = parse_args(kwargs, defaults=DEFAULTS_PIPE)
196+
model = parsed(model)
197+
198+
X, y = get_data(n_samples=n_samples)
199+
print("Training MLP classifier in a pipeline")
200+
model.fit(X, y)
201+
202+
save_model(model, output_file)
203+
204+
205+
if __name__ == '__main__':
206+
# register 2 functions, "net" and "pipeline"
207+
fire.Fire({'net': net, 'pipeline': pipeline})

‎requirements-dev.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1+
flaky
12
jupyter
23
matplotlib>=2.0.2
34
numpydoc
45
openpyxl
56
pandas
67
pylint
7-
pytest
8+
pytest>=3.4
89
pytest-cov
910
sphinx
1011
sphinx_rtd_theme
11-
flaky

‎skorch/cli.py

+336
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
"""Helper functions for quick command line interfaces with skorch and
2+
fire.
3+
4+
"""
5+
6+
from functools import partial
7+
from importlib import import_module
8+
from itertools import chain
9+
import re
10+
import shlex
11+
import sys
12+
13+
from sklearn.base import BaseEstimator
14+
from sklearn.pipeline import FeatureUnion
15+
from sklearn.pipeline import Pipeline
16+
17+
try:
18+
from fire.parser import DefaultParseValue
19+
except ImportError:
20+
raise ImportError("Using skorch cli helpers requires the fire library,"
21+
" you can install it with pip: pip install fire.")
22+
23+
try:
24+
from numpydoc.docscrape import ClassDoc
25+
except ImportError:
26+
raise ImportError("Using skorch cli helpers requires the numpydoc library,"
27+
" you can install it with pip: pip install numpydoc.")
28+
29+
30+
__all__ = ['parse_args']
31+
32+
33+
# matches: bar(), foo.bar(), foo.bar(baz)
34+
P_PARAMS = re.compile(r"(?P<name>^[a-zA-Z][a-zA-Z0-9_\.]*)(?P<params>\(.*\)$)")
35+
36+
P_DEFAULTS = re.compile(
37+
# standard, matches: int (default=123)
38+
r"(.+\s\(default\s?\=\s?(?P<default>.+)\)$)|"
39+
# no parens, matches: int, default=123
40+
r"(.+\sdefault\s?\=\s?(?P<default_np>.+)$)|"
41+
# no equal, matches: int, default 123
42+
r"(.+default\s(?P<default_ne>.+))|"
43+
# 'by-default', matches: str (l2 by default)
44+
r"[^\(]+\((?P<default_bd>[^\"\']+)(\sby\sdefault\)?)|"
45+
# 'by-default-double-tick', matches: "l1" or "l2" ("l2" by default)
46+
r"[^\(]+\(\"(?P<default_bd_dt>.+)\"\sby\sdefault\)?|"
47+
# 'by-default-single-tick', matches: 'l1' or 'l2' ('l2' by default)
48+
r"[^\(]+\(\'(?P<default_bd_st>.+)\'\sby\sdefault\)?"
49+
)
50+
51+
52+
def _param_split(params):
53+
return (p.strip(' ,') for p in shlex.split(params))
54+
55+
56+
def _get_span(s, pattern):
57+
"""Return the span of the first group that matches the pattern."""
58+
i, j = -1, -1
59+
60+
match = pattern.match(s)
61+
if not match:
62+
return i, j
63+
64+
for group_name in pattern.groupindex:
65+
i, j = match.span(group_name)
66+
if (i, j) != (-1, -1):
67+
return i, j
68+
69+
return i, j
70+
71+
72+
def _substitute_default(s, new_value):
73+
"""Replaces the default value in a parameter docstring by a new value.
74+
75+
The docstring must conform to the numpydoc style and have the form
76+
"something (keyname=<value-to-replace>)"
77+
78+
If no matching pattern is found or ``new_value`` is None, return
79+
the input untouched.
80+
81+
Examples
82+
--------
83+
>>> _replace_default('int (default=128)', 256)
84+
'int (default=256)'
85+
>>> _replace_default('nonlin (default = ReLU())', nn.Hardtanh(1, 2))
86+
'nonlin (default = Hardtanh(min_val=1, max_val=2))'
87+
88+
"""
89+
if new_value is None:
90+
return s
91+
92+
# BB: ideally, I would like to replace the 'default*' group
93+
# directly but I haven't found a way to do this
94+
i, j = _get_span(s, pattern=P_DEFAULTS)
95+
if (i, j) == (-1, -1):
96+
return s
97+
return '{}{}{}'.format(s[:i], new_value, s[j:])
98+
99+
100+
def _parse_args_kwargs(params):
101+
args = ()
102+
kwargs = {}
103+
for param in _param_split(params):
104+
if '=' not in param:
105+
args += (DefaultParseValue(param),)
106+
else:
107+
k, v = param.split('=')
108+
kwargs[k.strip()] = DefaultParseValue(v)
109+
return args, kwargs
110+
111+
112+
def _resolve_dotted_name(dotted_name):
113+
"""Returns objects from strings
114+
115+
Deals e.g. with 'torch.nn.Softmax(dim=-1)'.
116+
117+
Modified from palladium:
118+
119+
https://github.com/ottogroup/palladium/blob/8a066a9a7690557d9b1b6ed54b7d1a1502ba59e3/palladium/util.py
120+
121+
with added support for instantiated objects.
122+
123+
"""
124+
if not isinstance(dotted_name, str):
125+
return dotted_name
126+
127+
if '.' not in dotted_name:
128+
return dotted_name
129+
130+
args = None
131+
params = None
132+
match = P_PARAMS.match(dotted_name)
133+
if match:
134+
dotted_name = match.group('name')
135+
params = match.group('params')
136+
137+
module, name = dotted_name.rsplit('.', 1)
138+
attr = import_module(module)
139+
attr = getattr(attr, name)
140+
141+
if params:
142+
args, kwargs = _parse_args_kwargs(params[1:-1])
143+
attr = attr(*args, **kwargs)
144+
145+
return attr
146+
147+
148+
def parse_net_kwargs(kwargs):
149+
"""Parse arguments for the estimator.
150+
151+
Resolves dotted names and instantiated classes.
152+
153+
Examples
154+
--------
155+
>>> kwargs = {'lr': 0.1, 'module__nonlin': 'torch.nn.Hardtanh(-2, max_val=3)'}
156+
>>> parse_net_kwargs(kwargs)
157+
{'lr': 0.1, 'module__nonlin': Hardtanh(min_val=-2, max_val=3)}
158+
159+
"""
160+
if not kwargs:
161+
return kwargs
162+
163+
resolved = {}
164+
for k, v in kwargs.items():
165+
resolved[k] = _resolve_dotted_name(v)
166+
167+
return resolved
168+
169+
170+
def _yield_preproc_steps(model):
171+
if not isinstance(model, Pipeline):
172+
return
173+
174+
for key, val in model.get_params().items():
175+
if isinstance(val, BaseEstimator):
176+
if not isinstance(val, (Pipeline, FeatureUnion)):
177+
yield key, val
178+
179+
180+
def _yield_estimators(model):
181+
"""Yield estimator and its prefix from the model.
182+
183+
First, pipeline preprocessing steps are yielded (if there are
184+
any). Next the neural net is yielded. Finally, the module is
185+
yielded.
186+
187+
"""
188+
yield from _yield_preproc_steps(model)
189+
190+
net_prefixes = []
191+
module_prefixes = []
192+
193+
if isinstance(model, Pipeline):
194+
name = model.steps[-1][0]
195+
net_prefixes.append(name)
196+
module_prefixes.append(name)
197+
net = model.steps[-1][1]
198+
else:
199+
net = model
200+
201+
yield '__'.join(net_prefixes), net
202+
203+
module = net.module
204+
module_prefixes.append('module')
205+
yield '__'.join(module_prefixes), module
206+
207+
208+
def _extract_estimator_cls(estimator):
209+
if isinstance(estimator, partial):
210+
# is partialled
211+
return estimator.func
212+
if not isinstance(estimator, type):
213+
# is instance
214+
return estimator.__class__
215+
return estimator
216+
217+
218+
def _yield_printable_params(param, prefix, defaults):
219+
name, default, descr = param
220+
name = name if not prefix else '__'.join((prefix, name))
221+
default = _substitute_default(default, defaults.get(name))
222+
223+
printable = '--{} : {}'.format(name, default)
224+
yield printable
225+
226+
for line in descr:
227+
yield line
228+
229+
230+
def _get_help_for_params(params, prefix='--', defaults=None, indent=2):
231+
defaults = defaults or {}
232+
for param in params:
233+
first, *rest = tuple(_yield_printable_params(
234+
param, prefix=prefix, defaults=defaults))
235+
yield " " * indent + first
236+
for line in rest:
237+
yield " " * 2 * indent + line
238+
239+
240+
def _get_help_for_estimator(prefix, estimator, defaults=None):
241+
"""Yield help lines for the given estimator and prefix."""
242+
defaults = defaults or {}
243+
estimator = _extract_estimator_cls(estimator)
244+
yield "<{}> options:".format(estimator.__name__)
245+
246+
doc = ClassDoc(estimator)
247+
yield from _get_help_for_params(
248+
doc['Parameters'],
249+
prefix=prefix,
250+
defaults=defaults,
251+
)
252+
yield '' # add a newline line between estimators
253+
254+
255+
def print_help(model, defaults=None):
256+
"""Print help for the command line arguments of the given model.
257+
258+
Parameters
259+
----------
260+
model : sklearn.base.BaseEstimator
261+
The basic model, e.g. a ``NeuralNet`` or sklearn ``Pipeline``.
262+
263+
defautls : dict or None (default=None)
264+
Optionally, change the default values to use custom
265+
defaults. Commandline arguments have precedence over defaults.
266+
267+
"""
268+
defaults = defaults or {}
269+
270+
print("This is the help for the model-specific parameters.")
271+
print("To invoke help for the remaining options, run:")
272+
print("python {} -- --help".format(sys.argv[0]))
273+
print()
274+
275+
lines = (_get_help_for_estimator(prefix, estimator, defaults=defaults) for
276+
prefix, estimator in _yield_estimators(model))
277+
print('\n'.join(chain(*lines)))
278+
279+
280+
def parse_args(kwargs, defaults=None):
281+
"""Apply command line arguments or show help.
282+
283+
Use this in conjunction with the fire library to quickly build
284+
command line interfaces for your scripts.
285+
286+
This function returns another function that must be called with
287+
the estimator (e.g. ``NeuralNet``) to apply the parsed command
288+
line arguments. If the --help option is found, show the
289+
estimator-specific help instead.
290+
291+
Examples
292+
--------
293+
Content of my_script.py:
294+
295+
>>> def main(**kwargs):
296+
>>> X, y = get_data()
297+
>>> my_model = get_model()
298+
>>> parsed = parse_args(kwargs)
299+
>>> my_model = parsed(my_model)
300+
>>> my_model.fit(X, y)
301+
>>>
302+
>>> if __name__ == '__main__':
303+
>>> fire.Fire(main)
304+
305+
Parameters
306+
----------
307+
kwargs : dict
308+
The arguments as parsed by fire.
309+
310+
defautls : dict or None (default=None)
311+
Optionally, change the default values to use custom
312+
defaults. Commandline arguments have precedence over defaults.
313+
314+
Returns
315+
-------
316+
print_help_and_exit : callable
317+
If --help is in the arguments, print help and exit.
318+
319+
set_params : callable
320+
If --help is not in the options, apply command line arguments to
321+
the estimator and return it.
322+
323+
"""
324+
defaults = defaults or {}
325+
326+
def print_help_and_exit(estimator):
327+
print_help(estimator, defaults=defaults)
328+
sys.exit()
329+
330+
def set_params(estimator):
331+
estimator.set_params(**defaults)
332+
return estimator.set_params(**parse_net_kwargs(kwargs))
333+
334+
if kwargs.get('help'):
335+
return print_help_and_exit
336+
return set_params

‎skorch/helper.py

+1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
from skorch.utils import _make_split
1212
from skorch.utils import _make_optimizer
13+
from skorch.cli import parse_args
1314
from skorch.utils import is_torch_data_type
1415

1516

‎skorch/tests/test_cli.py

+321
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
"""Test for cli.py"""
2+
3+
from math import cos
4+
import os
5+
import subprocess
6+
from unittest.mock import Mock
7+
from unittest.mock import patch
8+
9+
import numpy as np
10+
import pytest
11+
from sklearn.pipeline import FeatureUnion
12+
from sklearn.pipeline import Pipeline
13+
from sklearn.preprocessing import MinMaxScaler
14+
from torch import nn
15+
from torch.nn import RReLU
16+
17+
18+
fire_installed = True
19+
try:
20+
import fire
21+
except ImportError:
22+
fire_installed = False
23+
24+
25+
@pytest.mark.skipif(not fire_installed, reason='fire libarary not installed')
26+
class TestCli:
27+
@pytest.fixture
28+
def resolve_dotted_name(self):
29+
from skorch.cli import _resolve_dotted_name
30+
return _resolve_dotted_name
31+
32+
@pytest.mark.parametrize('name, expected', [
33+
(0, 0),
34+
(1.23, 1.23),
35+
('foo', 'foo'),
36+
('math.cos', cos),
37+
('torch.nn', nn),
38+
('torch.nn.ReLU', nn.ReLU),
39+
])
40+
def test_resolve_dotted_name(self, resolve_dotted_name, name, expected):
41+
result = resolve_dotted_name(name)
42+
assert result == expected
43+
44+
def test_resolve_dotted_name_instantiated(self, resolve_dotted_name):
45+
result = resolve_dotted_name('torch.nn.RReLU(0.123, upper=0.456)')
46+
assert isinstance(result, RReLU)
47+
assert np.isclose(result.lower, 0.123)
48+
assert np.isclose(result.upper, 0.456)
49+
50+
@pytest.fixture
51+
def parse_net_kwargs(self):
52+
from skorch.cli import parse_net_kwargs
53+
return parse_net_kwargs
54+
55+
def test_parse_net_kwargs(self, parse_net_kwargs):
56+
kwargs = {
57+
'lr': 0.05,
58+
'max_epochs': 5,
59+
'module__num_units': 10,
60+
'module__nonlin': 'torch.nn.RReLU(0.123, upper=0.456)',
61+
}
62+
parsed_kwargs = parse_net_kwargs(kwargs)
63+
64+
assert len(parsed_kwargs) == 4
65+
assert np.isclose(parsed_kwargs['lr'], 0.05)
66+
assert parsed_kwargs['max_epochs'] == 5
67+
assert parsed_kwargs['module__num_units'] == 10
68+
assert isinstance(parsed_kwargs['module__nonlin'], RReLU)
69+
assert np.isclose(parsed_kwargs['module__nonlin'].lower, 0.123)
70+
assert np.isclose(parsed_kwargs['module__nonlin'].upper, 0.456)
71+
72+
@pytest.fixture
73+
def net_cls(self):
74+
from skorch import NeuralNetClassifier
75+
return NeuralNetClassifier
76+
77+
@pytest.fixture
78+
def net(self, net_cls, classifier_module):
79+
return net_cls(classifier_module)
80+
81+
@pytest.fixture
82+
def pipe(self, net):
83+
return Pipeline([
84+
('features', FeatureUnion([
85+
('scale', MinMaxScaler()),
86+
])),
87+
('net', net),
88+
])
89+
90+
@pytest.fixture
91+
def yield_estimators(self):
92+
from skorch.cli import _yield_estimators
93+
return _yield_estimators
94+
95+
def test_yield_estimators_net(self, yield_estimators, net):
96+
result = list(yield_estimators(net))
97+
98+
assert result[0][0] == ''
99+
assert result[0][1] is net
100+
assert result[1][0] == 'module'
101+
assert result[1][1] is net.module
102+
103+
def test_yield_estimators_pipe(self, yield_estimators, pipe):
104+
result = list(yield_estimators(pipe))
105+
scaler = pipe.named_steps['features'].transformer_list[0][1]
106+
net = pipe.named_steps['net']
107+
module = net.module
108+
109+
assert result[0][0] == 'features__scale'
110+
assert result[0][1] is scaler
111+
assert result[1][0] == 'net'
112+
assert result[1][1] is net
113+
assert result[2][0] == 'net__module'
114+
assert result[2][1] is module
115+
116+
@pytest.fixture
117+
def substitute_default(self):
118+
from skorch.cli import _substitute_default
119+
return _substitute_default
120+
121+
@pytest.mark.parametrize('s, new_value, expected', [
122+
('', '', ''),
123+
('', 'foo', ''),
124+
('bar', 'foo', 'bar'),
125+
('int (default=128)', '', 'int (default=)'),
126+
('int (default=128)', None, 'int (default=128)'),
127+
('int (default=128)', '""', 'int (default="")'),
128+
('int (default=128)', '128', 'int (default=128)'),
129+
('int (default=128)', '256', 'int (default=256)'),
130+
('int (default=128)', 256, 'int (default=256)'),
131+
('with_parens (default=(1, 2))', (3, 4), 'with_parens (default=(3, 4))'),
132+
('int (default =128)', '256', 'int (default =256)'),
133+
('int (default= 128)', '256', 'int (default= 256)'),
134+
('int (default = 128)', '256', 'int (default = 256)'),
135+
(
136+
'nonlin (default = ReLU())',
137+
nn.Hardtanh(1, 2),
138+
'nonlin (default = {})'.format(nn.Hardtanh(1, 2))
139+
),
140+
(
141+
# from sklearn MinMaxScaler
142+
'tuple (min, max), default=(0, 1)',
143+
(-1, 1),
144+
'tuple (min, max), default=(-1, 1)'
145+
),
146+
(
147+
# from sklearn MinMaxScaler
148+
'boolean, optional, default True',
149+
False,
150+
'boolean, optional, default False'
151+
),
152+
(
153+
# from sklearn Normalizer
154+
"'l1', 'l2', or 'max', optional ('l2' by default)",
155+
'l1',
156+
"'l1', 'l2', or 'max', optional ('l1' by default)"
157+
),
158+
(
159+
# same but double ticks
160+
'"l1", "l2", or "max", optional ("l2" by default)',
161+
'l1',
162+
'"l1", "l2", or "max", optional ("l1" by default)'
163+
),
164+
(
165+
# same but no ticks
166+
"l1, l2, or max, optional (l2 by default)",
167+
'l1',
168+
"l1, l2, or max, optional (l1 by default)"
169+
),
170+
(
171+
"tuple, optional ((1, 1) by default)",
172+
(2, 2),
173+
"tuple, optional ((2, 2) by default)"
174+
),
175+
(
176+
"nonlin (ReLU() by default)",
177+
nn.Tanh(),
178+
"nonlin (Tanh() by default)"
179+
),
180+
])
181+
def test_replace_default(self, substitute_default, s, new_value, expected):
182+
result = substitute_default(s, new_value)
183+
assert result == expected
184+
185+
@pytest.fixture
186+
def print_help(self):
187+
from skorch.cli import print_help
188+
return print_help
189+
190+
def test_print_help_net(self, print_help, net, capsys):
191+
print_help(net)
192+
out = capsys.readouterr()[0]
193+
194+
expected_snippets = [
195+
'-- --help',
196+
'<NeuralNetClassifier> options',
197+
'--module : torch module (class or instance)',
198+
'--batch_size : int (default=128)',
199+
'<MLPModule> options',
200+
'--module__hidden_units : int (default=10)'
201+
]
202+
for snippet in expected_snippets:
203+
assert snippet in out
204+
205+
def test_print_help_net_custom_defaults(self, print_help, net, capsys):
206+
defaults = {'batch_size': 256, 'module__hidden_units': 55}
207+
print_help(net, defaults)
208+
out = capsys.readouterr()[0]
209+
210+
expected_snippets = [
211+
'-- --help',
212+
'<NeuralNetClassifier> options',
213+
'--module : torch module (class or instance)',
214+
'--batch_size : int (default=256)',
215+
'<MLPModule> options',
216+
'--module__hidden_units : int (default=55)'
217+
]
218+
for snippet in expected_snippets:
219+
assert snippet in out
220+
221+
def test_print_help_pipeline(self, print_help, pipe, capsys):
222+
print_help(pipe)
223+
out = capsys.readouterr()[0]
224+
225+
expected_snippets = [
226+
'-- --help',
227+
'<MinMaxScaler> options',
228+
'--features__scale__feature_range',
229+
'<NeuralNetClassifier> options',
230+
'--net__module : torch module (class or instance)',
231+
'--net__batch_size : int (default=128)',
232+
'<MLPModule> options',
233+
'--net__module__hidden_units : int (default=10)'
234+
]
235+
for snippet in expected_snippets:
236+
assert snippet in out
237+
238+
def test_print_help_pipeline_custom_defaults(
239+
self, print_help, pipe, capsys):
240+
defaults = {'net__batch_size': 256, 'net__module__hidden_units': 55}
241+
print_help(pipe, defaults=defaults)
242+
out = capsys.readouterr()[0]
243+
244+
expected_snippets = [
245+
'-- --help',
246+
'<MinMaxScaler> options',
247+
'--features__scale__feature_range',
248+
'<NeuralNetClassifier> options',
249+
'--net__module : torch module (class or instance)',
250+
'--net__batch_size : int (default=256)',
251+
'<MLPModule> options',
252+
'--net__module__hidden_units : int (default=55)'
253+
]
254+
for snippet in expected_snippets:
255+
assert snippet in out
256+
257+
@pytest.fixture
258+
def parse_args(self):
259+
from skorch.cli import parse_args
260+
return parse_args
261+
262+
@pytest.fixture
263+
def estimator(self, net_cls):
264+
mock = Mock(net_cls)
265+
return mock
266+
267+
def test_parse_args_help(self, parse_args, estimator):
268+
with patch('skorch.cli.sys.exit') as exit:
269+
with patch('skorch.cli.print_help') as help:
270+
parsed = parse_args({'help': True, 'foo': 'bar'})
271+
parsed(estimator)
272+
273+
assert estimator.set_params.call_count == 0 # kwargs and defaults
274+
assert help.call_count == 1
275+
assert exit.call_count == 1
276+
277+
def test_parse_args_run(self, parse_args, estimator):
278+
kwargs = {'foo': 'bar', 'baz': 'math.cos'}
279+
with patch('skorch.cli.sys.exit') as exit:
280+
with patch('skorch.cli.print_help') as help:
281+
parsed = parse_args(kwargs)
282+
parsed(estimator)
283+
284+
assert estimator.set_params.call_count == 2 # defaults and kwargs
285+
286+
defaults_set_params = estimator.set_params.call_args_list[0][1]
287+
assert not defaults_set_params # no defaults specified
288+
289+
kwargs_set_params = estimator.set_params.call_args_list[1][1]
290+
assert kwargs_set_params['foo'] == 'bar'
291+
assert kwargs_set_params['baz'] == cos
292+
293+
assert help.call_count == 0
294+
assert exit.call_count == 0
295+
296+
def test_parse_args_net_custom_defaults(self, parse_args, net):
297+
defaults = {'batch_size': 256, 'module__hidden_units': 55}
298+
kwargs = {'batch_size': 123, 'module__nonlin': nn.Hardtanh(1, 2)}
299+
parsed = parse_args(kwargs, defaults)
300+
net = parsed(net)
301+
302+
# cmd line args have precedence over defaults
303+
assert net.batch_size == 123
304+
assert net.module_.hidden_units == 55
305+
assert isinstance(net.module_.nonlin, nn.Hardtanh)
306+
assert net.module_.nonlin.min_val == 1
307+
assert net.module_.nonlin.max_val == 2
308+
309+
def test_parse_args_pipe_custom_defaults(self, parse_args, pipe):
310+
defaults = {'net__batch_size': 256, 'net__module__hidden_units': 55}
311+
kwargs = {'net__batch_size': 123, 'net__module__nonlin': nn.Hardtanh(1, 2)}
312+
parsed = parse_args(kwargs, defaults)
313+
pipe = parsed(pipe)
314+
net = pipe.steps[-1][1]
315+
316+
# cmd line args have precedence over defaults
317+
assert net.batch_size == 123
318+
assert net.module_.hidden_units == 55
319+
assert isinstance(net.module_.nonlin, nn.Hardtanh)
320+
assert net.module_.nonlin.min_val == 1
321+
assert net.module_.nonlin.max_val == 2

0 commit comments

Comments
 (0)
Please sign in to comment.