-
Notifications
You must be signed in to change notification settings - Fork 137
[Optimizer] Simplify optimizers using generalized vector math. #218
Conversation
- velocity = model.allDifferentiableVariables
- for kp in velocity.recursivelyAllWritableKeyPaths(to: Tensor<Float>.self) {
- velocity[keyPath: kp].resetToZero()
- }
- for kp in velocity.recursivelyAllWritableKeyPaths(to: Tensor<Double>.self) {
- velocity[keyPath: kp].resetToZero()
- } A question one might bring up is: would deleting this logic make it impossible to optimize models that contain layers/parameters nested within arrays? Answer is, no need to worry at this point! What makes generalized vector math different is that all operations on these aggregate tangent vectors strictly follow how their |
Should we wait for this patch before contributing more Optimizers? |
@Mistobaan I think ideally yes. Because if you add any optimizers, they'll need to be cleaned up in this PR. |
However, this causes a type checking ambiguity.
…o 'VectorProtocol'. (#25525) - Add broadcasting 'adding(_:)' and 'subtracting(_:)' to 'VectorProtocol'. These are important for aggregate algorithms such as machine learning optimizers (tensorflow/swift-apis#218). - Make their implementations compiler-derivable. - Add operators for `adding(_:)` and `subtracting(_:)` in a protocol extension. - Comment out all operators in protocol extensions for `VectorProtocol` because a source breakage has been found.
@rxwei What is currently blocking this? |
This PR shows that |
First, derivatives are always a tangent vector, so the word "tangent vector" itself has no issues. Second, Swift type names describe values in the type instead of the type/space itself, therefore type names do not end with "Space" or "Type". The actual problem, however, is that |
…ne-optimizers # Conflicts: # Sources/TensorFlow/Operators/Math.swift # Sources/TensorFlow/Optimizer.swift
@rxwei What's the status of the refactoring? I noticed you reverted most of the changes so I was wondering if you still plan to implement them soon. |
Yes, they need to be implemented with constraints to the |
Should I go ahead and open a PR for a new optimizer or wait until you push the updates? |
Feel free to open new PRs! |
To clarify the direction: optimizers can use:
One line optimizer (Riemann SGD): model.move(along: direction.scaled(by: -learningRate)) |
@dan-zheng given that this functionality is already supported what is currently preventing you from simplifying other optimizers, such as Adam? |
|
I noticed that you reverted the changes that switch from using key paths to using the vector protocol, so I assumed that something was not working out. If everything works out fine now, then why not switch from using key paths altogether for Adam and the other optimizers? |
We just haven't got around to it. Contributions are welcome! |
I just gave it a quick look and it seems I run into issues of performing operations with |
|
For example:
And if I switch to
This is because we only have the constraint that |
Yes. And yes, it will support Tensor. Tensor’s VectorSpaceScalar is always Float. |
Ok cool, thanks for the information. Another issue that pops up is that in Adam for example we need to compute |
Adding a constraint to |
This is the start of a series of PRs that make optimizers no longer depend on
KeyPathIterable
or requireAllDifferentiableVariables
to equalTangentVector
. This is possible because we recently overhauled generalized vector math.Changes include:
VectorProtocol
that defines arithmetic operators in terms ofadding(_:)
,subtracting(_:)
, andscaled(by:)
.SGD.update(_:along:)
to use vector math.Optimizer.update(_:along:)
takeinout Model
instead ofinout Model.AllDifferentiableVariables
. This makes it easy to deprecateAllDifferentiableVariables
later.update(_:along:)
that takesinout Model
to conform to the protocol without removing the implementation. This is for short-term source compatibility.