[Optimizer] Simplify optimizers using generalized vector math. #218

rxwei · 2019-06-12T07:31:46Z

This is the start of a series of PRs that make optimizers no longer depend on KeyPathIterable or require AllDifferentiableVariables to equal TangentVector. This is possible because we recently overhauled generalized vector math.

Changes include:

Define an extension for VectorProtocol that defines arithmetic operators in terms of adding(_:), subtracting(_:), and scaled(by:).
Change SGD.update(_:along:) to use vector math.
Make Optimizer.update(_:along:) take inout Model instead of inout Model.AllDifferentiableVariables. This makes it easy to deprecate AllDifferentiableVariables later.
Add a update(_:along:) that takes inout Model to conform to the protocol without removing the implementation. This is for short-term source compatibility.

Sources/TensorFlow/Optimizer.swift

rxwei · 2019-06-12T08:23:19Z

-        velocity = model.allDifferentiableVariables
-        for kp in velocity.recursivelyAllWritableKeyPaths(to: Tensor<Float>.self) {
-            velocity[keyPath: kp].resetToZero()
-        }
-        for kp in velocity.recursivelyAllWritableKeyPaths(to: Tensor<Double>.self) {
-            velocity[keyPath: kp].resetToZero()
-        }

A question one might bring up is: would deleting this logic make it impossible to optimize models that contain layers/parameters nested within arrays? Answer is, no need to worry at this point! What makes generalized vector math different is that all operations on these aggregate tangent vectors strictly follow how their VectorProtocol conformances are defined. Array.TangentVector is defined to handle .zero and count differences, so I suspect vector operations on velocity will transform it to have the right shape. A lot of implementation details that we used to worry about with key path iteration are no longer a problem.

Mistobaan · 2019-06-14T16:11:31Z

Should we wait for this patch before contributing more Optimizers?

Shashi456 · 2019-06-14T16:14:02Z

@Mistobaan I think ideally yes. Because if you add any optimizers, they'll need to be cleaned up in this PR.

However, this causes a type checking ambiguity.

…ne-optimizers

…o 'VectorProtocol'. (#25525) - Add broadcasting 'adding(_:)' and 'subtracting(_:)' to 'VectorProtocol'. These are important for aggregate algorithms such as machine learning optimizers (tensorflow/swift-apis#218). - Make their implementations compiler-derivable. - Add operators for `adding(_:)` and `subtracting(_:)` in a protocol extension. - Comment out all operators in protocol extensions for `VectorProtocol` because a source breakage has been found.

eaplatanios · 2019-06-23T14:16:46Z

@rxwei What is currently blocking this?

eaplatanios · 2019-06-24T23:39:44Z

This PR shows that TangentVector conforms to VectorProtocol for optimizers. In this case, would it make sense to rename TangentVector to just Tangent or TangentSpace to avoid confusion?

rxwei · 2019-06-25T00:58:51Z

This PR shows that TangentVector conforms to VectorProtocol for optimizers. In this case, would it make sense to rename TangentVector to just Tangent or TangentSpace to avoid confusion?

First, derivatives are always a tangent vector, so the word "tangent vector" itself has no issues. Second, Swift type names describe values in the type instead of the type/space itself, therefore type names do not end with "Space" or "Type".

The actual problem, however, is that TangentVector should be required to conform to VectorProtocol because it is a vector space. It is not done yet because VectorProtocol may not belong to the Swift standard library and differentiation only needs AdditiveArithmetic (making the stdlib addition of VectorProtocol hard to justify).

…ne-optimizers # Conflicts: # Sources/TensorFlow/Operators/Math.swift # Sources/TensorFlow/Optimizer.swift

…ne-optimizers

eaplatanios · 2019-06-28T16:13:50Z

@rxwei What's the status of the refactoring? I noticed you reverted most of the changes so I was wondering if you still plan to implement them soon.

rxwei · 2019-06-28T16:32:46Z

@rxwei What's the status of the refactoring? I noticed you reverted most of the changes so I was wondering if you still plan to implement them soon.

Yes, they need to be implemented with constraints to the PointwiseMultiplicative protocol.

eaplatanios · 2019-06-28T16:35:42Z

Should I go ahead and open a PR for a new optimizer or wait until you push the updates?

rxwei · 2019-06-28T16:38:22Z

Feel free to open new PRs!

dan-zheng · 2019-06-28T16:38:47Z

@rxwei What's the status of the refactoring? I noticed you reverted most of the changes so I was wondering if you still plan to implement them soon.

Yes, they need to be implemented with constraints to the PointwiseMultiplicative protocol.

To clarify the direction: optimizers can use:

TangentVector addition/subtraction via AdditiveArithmetic conformances.
TangentVector multiplication/division via PointwiseMultiplicative conformances.
TangentVector elementary functions via ElementaryFunctions conformances.
TangentVector scalar addition/subtraction/multiplication via VectorProtocol conformances.
Update Model with Model.TangentVector via Differentiable.move(along:).

One line optimizer (Riemann SGD):

model.move(along: direction.scaled(by: -learningRate))

eaplatanios · 2019-06-28T16:48:47Z

@dan-zheng given that this functionality is already supported what is currently preventing you from simplifying other optimizers, such as Adam?

rxwei · 2019-06-28T16:57:08Z

Adam and other optimizers are not blocked! This PR was done when when PointwiseMultiplicative was not available :)

eaplatanios · 2019-06-28T17:02:33Z

I noticed that you reverted the changes that switch from using key paths to using the vector protocol, so I assumed that something was not working out. If everything works out fine now, then why not switch from using key paths altogether for Adam and the other optimizers?

rxwei · 2019-06-28T17:05:48Z

We just haven't got around to it. Contributions are welcome!

eaplatanios · 2019-06-28T17:14:30Z

I just gave it a quick look and it seems I run into issues of performing operations with Float (e.g., the learning rate) and TangentVector. If I change all the Floats to TangentVector.VectorSpaceScalar then I can’t support the default values in the constructor which are float literals. What’s a good workaround for this? I would like to avoid multiple extensions (e.g., for Float and Double) each with its own constructor.

rxwei · 2019-06-28T17:24:58Z

Tensor.VectorSpaceScalar is Float. This allows us to use Float scalars everywhere in optimizers. What specific operation was causing errors for you?

eaplatanios · 2019-06-28T17:44:30Z

For example:

error: binary operator '*' cannot be applied to operands of type 'Model.TangentVector' and 'Float'
        firstMoments = firstMoments * beta1
                       ~~~~~~~~~~~~ ^ ~~~~~

And if I switch to firstMoments.scaled(by: beta1) I get:

error: cannot invoke 'scaled' with an argument list of type '(by: Float)'
        firstMoments = firstMoments.scaled(by: beta1)
                                    ^

This is because we only have the constraint that Model.TangentVector: VectorProtocol. Should I add a constraint for Model.TangentVector.VectorSpaceScalar == Float? Would that support Tensor<Double> though?

rxwei · 2019-06-28T18:02:02Z

Yes. And yes, it will support Tensor. Tensor’s VectorSpaceScalar is always Float.

eaplatanios · 2019-06-28T18:11:07Z

Ok cool, thanks for the information. Another issue that pops up is that in Adam for example we need to compute sqrt(secondMoments), but sqrt is not supported by VectorSpace. This is trickier I believe though. Have you thought of a workaround for that?

rxwei · 2019-06-28T18:14:17Z

Adding a constraint to ElementaryFunctions will give you access to Model.TangentVector.sqrt(_:).

[Optimizer] Simplify optimizers using generalized vector math.

e408f82

rxwei added the enhancement New feature or request label Jun 12, 2019

eaplatanios reviewed Jun 12, 2019

View reviewed changes

Sources/TensorFlow/Optimizer.swift Outdated Show resolved Hide resolved

rxwei added 2 commits June 12, 2019 01:12

Make += internal.

5873165

Relax the generic constraint on Model to just Differentiable

1b15f31

rxwei added 2 commits June 12, 2019 01:53

Update all optimizers.

aac4275

Make -= be defined with a SignedNumeric constraint.

1846806

rxwei and others added 2 commits June 14, 2019 16:39

Extend 'Differentiable' to have '+=' and '-=' internally.

498b668

Fix Differentiable += and -= extensions.

92a43a1

However, this causes a type checking ambiguity.

dan-zheng mentioned this pull request Jun 15, 2019

Derive ElementaryFunctions conformances for structs. swiftlang/swift#25500

Merged

Merge branch 'master' of github.com:tensorflow/swift-apis into one-li…

107d176

…ne-optimizers

rxwei mentioned this pull request Jun 18, 2019

[stdlib] [Sema] Add broadcasting 'adding(_:)' and 'subtracting(_:)' to 'VectorProtocol'. swiftlang/swift#25525

Merged

rxwei added 2 commits June 24, 2019 22:44

Merge branch 'master' of github.com:tensorflow/swift-apis into one-li…

283efcb

…ne-optimizers # Conflicts: # Sources/TensorFlow/Operators/Math.swift # Sources/TensorFlow/Optimizer.swift

Update math operators.

c1e5fe7

dan-zheng mentioned this pull request Jun 26, 2019

[stdlib] Add PointwiseMultiplicative protocol. swiftlang/swift#25772

Merged

rxwei and others added 5 commits June 26, 2019 17:16

Revert Adam to using key paths.

406d860

Revert more to using key paths.

dd8e753

Merge branch 'master' of github.com:tensorflow/swift-apis into one-li…

311f822

…ne-optimizers

Make 'update(_:along:)' compatible with 'allDifferentiableVariables'.

a34eb57

More tests.

7d5bfcf

rxwei marked this pull request as ready for review June 27, 2019 07:24

rxwei added kokoro:run and removed kokoro:run labels Jun 27, 2019

rxwei merged commit 12d7030 into tensorflow:master Jun 27, 2019

rxwei deleted the one-line-optimizers branch June 27, 2019 07:26

Shashi456 mentioned this pull request Jun 28, 2019

Add AdaMax optimizer #243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimizer] Simplify optimizers using generalized vector math. #218

[Optimizer] Simplify optimizers using generalized vector math. #218

rxwei commented Jun 12, 2019 •

edited

Loading

rxwei commented Jun 12, 2019 •

edited

Loading

Mistobaan commented Jun 14, 2019

Shashi456 commented Jun 14, 2019

eaplatanios commented Jun 23, 2019

eaplatanios commented Jun 24, 2019

rxwei commented Jun 25, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

dan-zheng commented Jun 28, 2019 •

edited

Loading

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019 •

edited

Loading

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

[Optimizer] Simplify optimizers using generalized vector math. #218

[Optimizer] Simplify optimizers using generalized vector math. #218

Conversation

rxwei commented Jun 12, 2019 • edited Loading

rxwei commented Jun 12, 2019 • edited Loading

Mistobaan commented Jun 14, 2019

Shashi456 commented Jun 14, 2019

eaplatanios commented Jun 23, 2019

eaplatanios commented Jun 24, 2019

rxwei commented Jun 25, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

dan-zheng commented Jun 28, 2019 • edited Loading

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019 • edited Loading

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

eaplatanios commented Jun 28, 2019

rxwei commented Jun 28, 2019

rxwei commented Jun 12, 2019 •

edited

Loading

rxwei commented Jun 12, 2019 •

edited

Loading

dan-zheng commented Jun 28, 2019 •

edited

Loading

rxwei commented Jun 28, 2019 •

edited

Loading