Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

[Optimizer] Simplify optimizers using generalized vector math. #218

Merged
merged 15 commits into from
Jun 27, 2019

Conversation

rxwei
Copy link
Contributor

@rxwei rxwei commented Jun 12, 2019

This is the start of a series of PRs that make optimizers no longer depend on KeyPathIterable or require AllDifferentiableVariables to equal TangentVector. This is possible because we recently overhauled generalized vector math.

Changes include:

  • Define an extension for VectorProtocol that defines arithmetic operators in terms of adding(_:), subtracting(_:), and scaled(by:).
  • Change SGD.update(_:along:) to use vector math.
  • Make Optimizer.update(_:along:) take inout Model instead of inout Model.AllDifferentiableVariables. This makes it easy to deprecate AllDifferentiableVariables later.
  • Add a update(_:along:) that takes inout Model to conform to the protocol without removing the implementation. This is for short-term source compatibility.

@rxwei rxwei added the enhancement New feature or request label Jun 12, 2019
@rxwei
Copy link
Contributor Author

rxwei commented Jun 12, 2019

-        velocity = model.allDifferentiableVariables
-        for kp in velocity.recursivelyAllWritableKeyPaths(to: Tensor<Float>.self) {
-            velocity[keyPath: kp].resetToZero()
-        }
-        for kp in velocity.recursivelyAllWritableKeyPaths(to: Tensor<Double>.self) {
-            velocity[keyPath: kp].resetToZero()
-        }

A question one might bring up is: would deleting this logic make it impossible to optimize models that contain layers/parameters nested within arrays? Answer is, no need to worry at this point! What makes generalized vector math different is that all operations on these aggregate tangent vectors strictly follow how their VectorProtocol conformances are defined. Array.TangentVector is defined to handle .zero and count differences, so I suspect vector operations on velocity will transform it to have the right shape. A lot of implementation details that we used to worry about with key path iteration are no longer a problem.

@Mistobaan
Copy link
Contributor

Should we wait for this patch before contributing more Optimizers?

@Shashi456
Copy link
Contributor

@Mistobaan I think ideally yes. Because if you add any optimizers, they'll need to be cleaned up in this PR.

rxwei added a commit to swiftlang/swift that referenced this pull request Jun 18, 2019
…o 'VectorProtocol'. (#25525)

- Add broadcasting 'adding(_:)' and 'subtracting(_:)' to 'VectorProtocol'. These are important for aggregate algorithms such as machine learning optimizers (tensorflow/swift-apis#218).
- Make their implementations compiler-derivable.
- Add operators for `adding(_:)` and `subtracting(_:)` in a protocol extension.
- Comment out all operators in protocol extensions for `VectorProtocol` because a source breakage has been found.
@eaplatanios
Copy link
Contributor

@rxwei What is currently blocking this?

@eaplatanios
Copy link
Contributor

This PR shows that TangentVector conforms to VectorProtocol for optimizers. In this case, would it make sense to rename TangentVector to just Tangent or TangentSpace to avoid confusion?

@rxwei
Copy link
Contributor Author

rxwei commented Jun 25, 2019

This PR shows that TangentVector conforms to VectorProtocol for optimizers. In this case, would it make sense to rename TangentVector to just Tangent or TangentSpace to avoid confusion?

First, derivatives are always a tangent vector, so the word "tangent vector" itself has no issues. Second, Swift type names describe values in the type instead of the type/space itself, therefore type names do not end with "Space" or "Type".

The actual problem, however, is that TangentVector should be required to conform to VectorProtocol because it is a vector space. It is not done yet because VectorProtocol may not belong to the Swift standard library and differentiation only needs AdditiveArithmetic (making the stdlib addition of VectorProtocol hard to justify).

rxwei added 2 commits June 24, 2019 22:44
…ne-optimizers

# Conflicts:
#	Sources/TensorFlow/Operators/Math.swift
#	Sources/TensorFlow/Optimizer.swift
@rxwei rxwei marked this pull request as ready for review June 27, 2019 07:24
@rxwei rxwei merged commit 12d7030 into tensorflow:master Jun 27, 2019
@rxwei rxwei deleted the one-line-optimizers branch June 27, 2019 07:26
@Shashi456 Shashi456 mentioned this pull request Jun 28, 2019
@eaplatanios
Copy link
Contributor

@rxwei What's the status of the refactoring? I noticed you reverted most of the changes so I was wondering if you still plan to implement them soon.

@rxwei
Copy link
Contributor Author

rxwei commented Jun 28, 2019

@rxwei What's the status of the refactoring? I noticed you reverted most of the changes so I was wondering if you still plan to implement them soon.

Yes, they need to be implemented with constraints to the PointwiseMultiplicative protocol.

@eaplatanios
Copy link
Contributor

Should I go ahead and open a PR for a new optimizer or wait until you push the updates?

@rxwei
Copy link
Contributor Author

rxwei commented Jun 28, 2019

Feel free to open new PRs!

@dan-zheng
Copy link
Member

dan-zheng commented Jun 28, 2019

@rxwei What's the status of the refactoring? I noticed you reverted most of the changes so I was wondering if you still plan to implement them soon.

Yes, they need to be implemented with constraints to the PointwiseMultiplicative protocol.

To clarify the direction: optimizers can use:

  • TangentVector addition/subtraction via AdditiveArithmetic conformances.
  • TangentVector multiplication/division via PointwiseMultiplicative conformances.
  • TangentVector elementary functions via ElementaryFunctions conformances.
  • TangentVector scalar addition/subtraction/multiplication via VectorProtocol conformances.
  • Update Model with Model.TangentVector via Differentiable.move(along:).

One line optimizer (Riemann SGD):

model.move(along: direction.scaled(by: -learningRate))

@eaplatanios
Copy link
Contributor

@dan-zheng given that this functionality is already supported what is currently preventing you from simplifying other optimizers, such as Adam?

@rxwei
Copy link
Contributor Author

rxwei commented Jun 28, 2019

Adam and other optimizers are not blocked! This PR was done when when PointwiseMultiplicative was not available :)

@eaplatanios
Copy link
Contributor

I noticed that you reverted the changes that switch from using key paths to using the vector protocol, so I assumed that something was not working out. If everything works out fine now, then why not switch from using key paths altogether for Adam and the other optimizers?

@rxwei
Copy link
Contributor Author

rxwei commented Jun 28, 2019

We just haven't got around to it. Contributions are welcome!

@eaplatanios
Copy link
Contributor

I just gave it a quick look and it seems I run into issues of performing operations with Float (e.g., the learning rate) and TangentVector. If I change all the Floats to TangentVector.VectorSpaceScalar then I can’t support the default values in the constructor which are float literals. What’s a good workaround for this? I would like to avoid multiple extensions (e.g., for Float and Double) each with its own constructor.

@rxwei
Copy link
Contributor Author

rxwei commented Jun 28, 2019

Tensor.VectorSpaceScalar is Float. This allows us to use Float scalars everywhere in optimizers. What specific operation was causing errors for you?

@eaplatanios
Copy link
Contributor

For example:

error: binary operator '*' cannot be applied to operands of type 'Model.TangentVector' and 'Float'
        firstMoments = firstMoments * beta1
                       ~~~~~~~~~~~~ ^ ~~~~~

And if I switch to firstMoments.scaled(by: beta1) I get:

error: cannot invoke 'scaled' with an argument list of type '(by: Float)'
        firstMoments = firstMoments.scaled(by: beta1)
                                    ^

This is because we only have the constraint that Model.TangentVector: VectorProtocol. Should I add a constraint for Model.TangentVector.VectorSpaceScalar == Float? Would that support Tensor<Double> though?

@rxwei
Copy link
Contributor Author

rxwei commented Jun 28, 2019

Yes. And yes, it will support Tensor. Tensor’s VectorSpaceScalar is always Float.

@eaplatanios
Copy link
Contributor

Ok cool, thanks for the information. Another issue that pops up is that in Adam for example we need to compute sqrt(secondMoments), but sqrt is not supported by VectorSpace. This is trickier I believe though. Have you thought of a workaround for that?

@rxwei
Copy link
Contributor Author

rxwei commented Jun 28, 2019

Adding a constraint to ElementaryFunctions will give you access to Model.TangentVector.sqrt(_:).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants