Skip to content

Update model configs - Allow setters for common properties #13026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Sep 6, 2021

Conversation

nreimers
Copy link
Contributor

@nreimers nreimers commented Aug 6, 2021

Update model configs - Allow setters for common properties

Not all models use the same naming for config values, e.g. hidden_size is called n_embd in GPT2Config. So far, getters had been implemented in the config classes to allow that a GPT2Config can be accessed via config.hidden_size.

But the setters were missing, so that this code fails so far:

from transformers import GPT2Config
config = GPT2Config()
config.hidden_size = 4 # Fails

config = GPT2Config(hidden_size =4) # Fails

Changes

This PR adds an attribute_map to the config classes that maps the config parameters. For GPT2, this map looks like this:

attribute_map = {"hidden_size": "n_embd",
                     "max_position_embeddings": "n_positions",
                     "num_attention_heads": "n_head",
                     "num_hidden_layers": "n_layer"
                     }

The PretrainedConfig class overwrites the get & set attribute to check for the mappings in the attribute_map:

    def __setattr__(self, key, value):
        if key in super().__getattribute__('attribute_map'):
            key = super().__getattribute__('attribute_map')[key]
        super().__setattr__(key, value)

    def __getattribute__(self, key):
        if key != 'attribute_map' and key in super().__getattribute__('attribute_map'):
            key = super().__getattribute__('attribute_map')[key]
        return super().__getattribute__(key)

Advantages

  • Setters work, i.e. you can use config.hidden_size = 4 and GPT2Config(hidden_size=4)
  • No need to write individual getter- or setter-methods in the config classes. They are derived from the attribute_map

Detailed changes

  • PretrainedConfig: Add __setattr__ and __getattribute__ methods. Added docstring for attribute_map
  • GPT2Config: Add attribute map, remove old getters
  • test_configuration_common.py: Update create_and_test_config_common_properties method so that it tests that setters exist and work

Work in Progress

So far I only updated the GPT2Config to get your feedback. Unit-Tests for other config classes that have not yet been updated (i.e. don't provide setters for the common fields) like the GPTNeo config class will fail.

Once the design of the solution is approved, I will update all other config classes.

Update: All config classes updated

Fixes

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@sgugger @LysandreJik @NielsRogge

Code to test the change

Besides the unit tests, you can use this code to test the changes quickly:

from transformers import GPT2Config

config = GPT2Config()

config.hidden_size = 4
print("Hidden size", config.hidden_size, config.n_embd)

config.n_positions = 65
print("n_positions", config.max_position_embeddings, config.n_positions)

config.max_position_embeddings = 123
print("n_positions", config.max_position_embeddings, config.n_positions)


print("\n\n================\n\n")

## Note: conflicting arguments: hidden_size and n_embd are identical fields
# In that case, the synonym (hidden_size) will have higher priority
config = GPT2Config(hidden_size=4, n_embd=20, max_position_embeddings=80)
print("Hidden size", config.hidden_size, config.n_embd)
print("n_positions", config.max_position_embeddings, config.n_positions)

print("Export to json")
config.save_pretrained(".")

## Load config
print("Load from disc")
config = GPT2Config.from_pretrained('.')
print("Hidden size", config.hidden_size, config.n_embd)
print("n_positions", config.max_position_embeddings, config.n_positions)
assert config.hidden_size == config.n_embd
assert config.hidden_size == 4

assert config.max_position_embeddings == config.n_positions
assert config.max_position_embeddings == 80

@nreimers nreimers changed the title [DRAFT] Update model configs - Allow setters for common properties [WIP] Update model configs - Allow setters for common properties Aug 6, 2021
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design looks good to me! I think we could have a few more common attributes, since we are in the process of adding them:

  • the vocab size (seems to be pretty consistent)
  • the embedding size
  • the inner size for the feed-forward layers

Those on top of max_position_embeddings should all be included in common_properties so that we are sure they are common to each model.

@nreimers
Copy link
Contributor Author

nreimers commented Aug 6, 2021

The design looks good to me! I think we could have a few more common attributes, since we are in the process of adding them:

  • the vocab size (seems to be pretty consistent)
  • the embedding size
  • the inner size for the feed-forward layers

Those on top of max_position_embeddings should all be included in common_properties so that we are sure they are common to each model.

My idea was to put this into an independent, new PR and too keep this PR focused on just changing the getter / setters.

My plan is to come up with some scheme which attributes should be common. Here we can differentiate between model types: text (differentiated between encoder only and encoder-decoder), image, audio.

I analyzed all 50+ config classes and these are the most common fields:

model_type 55
vocab_size 51
architectures 49
pad_token_id 42
max_position_embeddings 41
num_hidden_layers 40
initializer_range 36
eos_token_id 34
bos_token_id 32
hidden_size 32
layer_norm_eps 32
hidden_act 30
intermediate_size 30
num_attention_heads 29
hidden_dropout_prob 28
attention_probs_dropout_prob 26
transformers_version 25
type_vocab_size 23
attention_dropout 22
gradient_checkpointing 21
dropout 19
activation_dropout 18
d_model 17
init_std 17
activation_function 16

But as mentioned, I would put this in another PR.

@nreimers nreimers changed the title [WIP] Update model configs - Allow setters for common properties Update model configs - Allow setters for common properties Aug 30, 2021
@nreimers
Copy link
Contributor Author

Hi @sgugger @LysandreJik @patil-suraj @patrickvonplaten

I also updated all other config classes so that they all use the attribute_map so that common properties (like hidden_size) can also be set (config.hidden_size = 123) or passed as argument (MyConfigClass(hidden_size = 123)).

I kept the behavior for the config classes as is, i.e. no new getter-methods were added, config classes were just extended to allow setting of the common properties.

If a setter method cannot be implemented for a class, an exception is raised:

All unit tests are passing.

Would be happy if you could have a look at this PR.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for rolling this out on all models. I think the last thing missing is some documentation in model_classes/config.rst, to list all the common attributes that can be used/set.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me - thanks a lot for your work here!

One thing, I'm wondering is whether we should allow this:

config = GPT2Config(hidden_size=4, n_embd=20, max_position_embeddings=80)

at all or just directly throw an error. I think it's cleaner to throw an error here actually instead of silently "disregarding" n_embd=20. What do you think @nreimers ?

@nreimers
Copy link
Contributor Author

nreimers commented Aug 31, 2021

@sgugger Will add a note to the docs

@patrickvonplaten Throwing an error is not easy.

GPT2Config defines n_embd=768 in the __init__ method, so:
config = GPT2Config(hidden_size=4)
and
config = GPT2Config(hidden_size=4, n_embd=768)

are identical calls of the method. We would expect method 1 to work.

In order to throw an exception for method 2, we could do:

  • Replace all default parameters with None, see if hidden_size is not set, then set n_embd to 768 => Major refactoring on all config classes would be needed with quite a lot of overhead. Further, default parameters would no longer be visible from the definition of the method.
  • Check if n_embd != hidden_size and n_embd != 768 => config = GPT2Config(hidden_size=4, n_embd=8) would throw an error, but config = GPT2Config(hidden_size=4, n_embd=768) would not raise an error (also not a nice solution). Also major refactoring would be needed as we would need to keep track of the default values for all parameters.

Do you have other ideas how this could be checked?

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Played with it extensively, this is working great! Thanks for working on this @nreimers.

Regarding @patrickvonplaten's comment, I think it's fine as it is but it could maybe use a note in the documentation.

The most important use-case regarding two variables assigned imo is when instantiating a config from a pretrained config and overriding an element, like in this scenario:

config = GPT2Config.from_pretrained("gpt2")
# config.n_embd: 768

config = GPT2Config.from_pretrained("gpt2", hidden_size=2)
# config.n_embd: 2

And this works flawlessly.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me!

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean implementation. Thanks a lot for working on this!

@nreimers
Copy link
Contributor Author

nreimers commented Sep 1, 2021

@sgugger
I updated the docs:
transformers/docs/source/main_classes/configuration.rst

And added a section on the common attributes. Please have a look.

@nreimers
Copy link
Contributor Author

nreimers commented Sep 6, 2021

Hi,
I just updated the PR with the newest commits from the master branch.

However, now the run_examples_torch fails in CircleCI:

==================================== ERRORS ====================================
______________ ERROR collecting examples/pytorch/test_examples.py ______________
ImportError while importing test module '/home/circleci/transformers/examples/pytorch/test_examples.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.6/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/pytorch/test_examples.py:51: in <module>
    import run_image_classification
examples/pytorch/image-classification/run_image_classification.py:27: in <module>
    from torchvision.transforms import (
E   ModuleNotFoundError: No module named 'torchvision'

Not sure why this happens, as this PR is not touching run_image_classification.py

Is this an issue with CircleCI or with the specific unit test?

@patil-suraj
Copy link
Contributor

patil-suraj commented Sep 6, 2021

Hi @nreimers, it's not related to this PR. That test fails because torchvision is not installed on the CI ( which is required in run_image_classification.py) for examples test. I've proposed a fix here #13438

@nreimers
Copy link
Contributor Author

nreimers commented Sep 6, 2021

Hi @patil-suraj
Thanks for the quick response.

What are the next steps for this PR? Wait until #13438 is merged and then, when all tests are passing, merging this PR?

Who will be merging this PR? Should I do it once all tests are passing?

@patil-suraj
Copy link
Contributor

The failed test is not related to this PR and all of us has approved this PR, so feel free to merge if everything is ready :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants