Add Adadelta Optimizer #302

lakshya-sky · 2019-06-27T18:23:19Z

No description provided.

Update

[Optimizer] Simplify optimizers using generalized vector math. (#218)

Sources/TensorFlow/Optimizer.swift

rxwei

Thank you!

rxwei · 2019-06-27T18:38:41Z

Sources/TensorFlow/Optimizer.swift

+    public var step: Int = 0
+    public var accumulators: Model.AllDifferentiableVariables
+    public var accDelta: Model.AllDifferentiableVariables
+


Remove extra line.

Sources/TensorFlow/Optimizer.swift

rxwei · 2019-06-27T18:40:56Z

Sources/TensorFlow/Optimizer.swift

+    /// The current step.
+    public var step: Int = 0
+    public var accumulators: Model.AllDifferentiableVariables
+    public var accDelta: Model.AllDifferentiableVariables


Could you add a doc comment that explains this property?

Avoid abbreviations in this property name since accDelta is not a word of art and may be confusing.

rxwei · 2019-06-27T18:41:39Z

Sources/TensorFlow/Optimizer.swift

+                       along direction: Model.TangentVector) {
+        update(&model.allDifferentiableVariables, along: direction)
+    }
+}


Please leave a trailing empty line in the file.

rxwei · 2019-06-28T05:13:52Z

Sources/TensorFlow/Optimizer.swift

+    public typealias Model = Model
+    /// The learning rate.
+    public var learningRate: Float
+    ///decay factor, corresponding to fraction of gradient to keep at each time step.


Suggested change

///decay factor, corresponding to fraction of gradient to keep at each time step.

/// The decay factor, corresponding to fraction of gradient to keep at each time step.

rxwei · 2019-06-28T05:14:49Z

Sources/TensorFlow/Optimizer.swift

+            accNew += (1 - rho) * (direction[keyPath: kp] * direction[keyPath: kp])
+            accumulators[keyPath: kp] = accNew 
+
+            var stepSize =  direction[keyPath: kp] * sqrt(updates[keyPath: kp] + epsilon)


Remove extra whitespace.

Suggested change

var stepSize = direction[keyPath: kp] * sqrt(updates[keyPath: kp] + epsilon)

var stepSize = direction[keyPath: kp] * sqrt(updates[keyPath: kp] + epsilon)

rxwei · 2019-06-28T05:14:57Z

Sources/TensorFlow/Optimizer.swift

+            accumulators[keyPath: kp] = accNew 
+
+            var stepSize =  direction[keyPath: kp] * sqrt(updates[keyPath: kp] + epsilon)
+            stepSize /=  sqrt(accumulators[keyPath: kp] + epsilon)


Remove extra whitespace.

Suggested change

stepSize /= sqrt(accumulators[keyPath: kp] + epsilon)

stepSize /= sqrt(accumulators[keyPath: kp] + epsilon)

rxwei · 2019-06-28T05:16:16Z

Sources/TensorFlow/Optimizer.swift

+            var updatesNew = updates[keyPath: kp] * rho
+            updatesNew += (1 - rho) * (stepSize * stepSize)
+            updates[keyPath: kp] =  updatesNew
+


Remove the redundant empty line.

Suggested change

rxwei · 2019-06-28T06:08:34Z

Sources/TensorFlow/Optimizer.swift

+    public var decay: Float
+    /// The current step.
+    public var step: Int = 0
+    /// Accumulate Gradients


Two suggestions on documentation comments:

Properties should be described with a noun phrase ending with a period.

Documentation comments for mathematical variables should use the term of art. In this case, we should switch to the accurate, proper terminology that's used by the paper.

Suggested change

/// Accumulate Gradients

/// The accumulated, exponentially decaying average of squared gradients.

rxwei · 2019-06-28T06:11:23Z

Sources/TensorFlow/Optimizer.swift

+    /// Accumulate Gradients
+    public var accumulators: Model.AllDifferentiableVariables
+    /// Accumulate Updates(here StepSizes)
+    public var updates: Model.AllDifferentiableVariables


Suggested change

public var updates: Model.AllDifferentiableVariables

public var accumulatedDelta: Model.AllDifferentiableVariables

Here using the word "delta" would make this variable better correspond to the 𝚫x term in the paper.

rxwei · 2019-06-28T06:19:14Z

Sources/TensorFlow/Optimizer.swift

+    public var step: Int = 0
+    /// Accumulate Gradients
+    public var accumulators: Model.AllDifferentiableVariables
+    /// Accumulate Updates(here StepSizes)


Suggested change

/// Accumulate Updates(here StepSizes)

/// The accumulated parameter updates.

rxwei · 2019-06-28T06:22:29Z

Sources/TensorFlow/Optimizer.swift

+    public init(
+        for model: __shared Model,
+        learningRate: Float = 1,
+        rho: Float = 0.9,


Keras uses rho = 0.95 by default.

rxwei · 2019-06-28T06:27:09Z

Sources/TensorFlow/Optimizer.swift

+
+        // Update Float & Double Tensor variables.
+        for kp in model.recursivelyAllWritableKeyPaths(to: Tensor<Float>.self) {
+            var accNew = rho * accumulators[keyPath: kp]


For clarity, I'd suggest rewriting this step as

averageSquared[keyPath: kp] *= rho averageSquared[keyPath: kp] += (1 - rho) * (direction[keyPath: kp] * direction[keyPath: kp])

This way, you won't need to define extra variables. Same suggestion for the Tensor<Double> handling block.

rxwei · 2019-06-28T06:29:42Z

Sources/TensorFlow/Optimizer.swift

+            stepSize /=  sqrt(accumulators[keyPath: kp] + epsilon)
+            model[keyPath: kp] -= learningRate * stepSize
+
+            var updatesNew = updates[keyPath: kp] * rho


I would suggest changing this step to the following:

accumulatedDelta *= rho accumulatedDelta += (1 - rho) * stepSize.squared()

Same suggestion for the Tensor<Double> handling block.

rxwei · 2019-06-28T07:12:01Z

Sources/TensorFlow/Optimizer.swift

+    public func update(_ model: inout Model.AllDifferentiableVariables,
+                       along direction: Model.AllDifferentiableVariables) {
+        step += 1
+        let learningRate = self.learningRate * 1 / (1 + decay * Float(step))


* 1 is redundant.

Suggested change

let learningRate = self.learningRate * 1 / (1 + decay * Float(step))

let learningRate = self.learningRate / (1 + decay * Float(step))

rxwei · 2019-06-28T07:12:32Z

Sources/TensorFlow/Optimizer.swift

+        for kp in model.recursivelyAllWritableKeyPaths(to: Tensor<Float>.self) {
+            averageSquared[keyPath: kp] *= rho
+            averageSquared[keyPath: kp] +=
+                    (1 - rho) * (direction[keyPath: kp] * direction[keyPath: kp])


Indent by 4.

Suggested change

(1 - rho) * (direction[keyPath: kp] * direction[keyPath: kp])

(1 - rho) * (direction[keyPath: kp] * direction[keyPath: kp])

rxwei · 2019-06-28T07:12:39Z

Sources/TensorFlow/Optimizer.swift

+            averageSquared[keyPath: kp] +=
+                    (1 - rho) * (direction[keyPath: kp] * direction[keyPath: kp])
+            var stepSize = direction[keyPath: kp] *
+                    sqrt(accumulatedDelta[keyPath: kp] + epsilon)


Suggested change

sqrt(accumulatedDelta[keyPath: kp] + epsilon)

sqrt(accumulatedDelta[keyPath: kp] + epsilon)

rxwei · 2019-06-28T07:12:43Z

Sources/TensorFlow/Optimizer.swift

+        for kp in model.recursivelyAllWritableKeyPaths(to: Tensor<Double>.self) {
+            averageSquared[keyPath: kp] *= Double(rho)
+            averageSquared[keyPath: kp] +=
+                    (1 - Double(rho)) * (direction[keyPath: kp] * direction[keyPath: kp])


Suggested change

(1 - Double(rho)) * (direction[keyPath: kp] * direction[keyPath: kp])

(1 - Double(rho)) * (direction[keyPath: kp] * direction[keyPath: kp])

rxwei · 2019-06-28T07:12:52Z

Sources/TensorFlow/Optimizer.swift

+            averageSquared[keyPath: kp] +=
+                    (1 - Double(rho)) * (direction[keyPath: kp] * direction[keyPath: kp])
+            var stepSize = direction[keyPath: kp] *
+                    sqrt(accumulatedDelta[keyPath: kp] + Double(epsilon))


Suggested change

sqrt(accumulatedDelta[keyPath: kp] + Double(epsilon))

sqrt(accumulatedDelta[keyPath: kp] + Double(epsilon))

rxwei · 2019-06-28T07:14:52Z

Sources/TensorFlow/Optimizer.swift

@@ -355,3 +355,98 @@ public class AdaGrad<Model: Layer>: Optimizer
        update(&model.allDifferentiableVariables, along: direction)
    }
 }
+
+/// Adadelta Optimizer


Use the canonical name (the name in the paper) to describe this class.

End all comments with a period.

Make the "O" in "Optimizer" be lowercased, for consistency with API documentation on other optimizers in this file.

Suggested change

/// Adadelta Optimizer

/// ADADELTA optimizer.

rxwei · 2019-06-28T07:19:20Z

Sources/TensorFlow/Optimizer.swift

+
+/// Adadelta Optimizer
+///
+/// It is a method that uses the magnitude of recent gradients 


API documentation should fit to size (i.e. to 100 columns).

I'd suggest changing the summary to the following, similar to the Keras documentation.

ADADELTA is a more robust extension of AdaGrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done. Compared to AdaGrad, in the original version of ADADELTA you don't have to set an initial learning rate. In this version, initial learning rate and decay factor can be set, as in most other optimizers.

rxwei · 2019-06-28T07:20:21Z

Sources/TensorFlow/Optimizer.swift

+        step += 1
+        let learningRate = self.learningRate * 1 / (1 + decay * Float(step))
+
+        // Update Float & Double Tensor variables.


Suggested change

// Update Float & Double Tensor variables.

// Update `Tensor<Float>` and `Tensor<Double>` variables.

rxwei · 2019-06-28T07:35:40Z

Sources/TensorFlow/Optimizer.swift

+
+/// ADADELTA optimizer.
+///
+/// ADADELTA is a more robust extension of AdaGrad that adapts learning rates


This block of API documentation has lines whose length varies significantly from lines to lines. Could you make sure each line is filled with words until it reaches 100 columns? This is what I meant by suggesting "fit to size". Hope this clarifies things!

Use "AdaGrad" and "ADADELTA" in documentation to stay faithful to paper spelling.

rxwei · 2019-06-28T08:01:11Z

Thanks @dan-zheng for reordering the class definitions!

Shashi456 · 2019-06-28T08:14:51Z

#127

lakshya-sky added 12 commits June 8, 2019 17:42

Merge pull request #1 from tensorflow/master

ba3c528

Update

Merge pull request #2 from tensorflow/master

80d1f96

Update

Merge pull request #3 from tensorflow/master

3eb0114

Update

Merge pull request #4 from tensorflow/master

59594bc

Update

Merge pull request #5 from tensorflow/master

72a97dd

Update

Merge pull request #6 from tensorflow/master

fcbe54f

Update

Merge pull request #7 from tensorflow/master

c35e177

Update

Merge pull request #8 from tensorflow/master

93ae12c

Update

Merge pull request #9 from tensorflow/master

44fa21d

Update

Merge pull request #10 from tensorflow/master

a3fd6ba

Update

Merge pull request #11 from tensorflow/master

96a25c7

[Optimizer] Simplify optimizers using generalized vector math. (#218)

Add AdaDelta

154e5ec

Shashi456 reviewed Jun 27, 2019

View reviewed changes

Sources/TensorFlow/Optimizer.swift Show resolved Hide resolved

rxwei requested review from rxwei and dan-zheng June 27, 2019 18:38

rxwei reviewed Jun 27, 2019

View reviewed changes

Review Change

e1899cd

rxwei suggested changes Jun 28, 2019

View reviewed changes

lakshya-sky added 2 commits June 28, 2019 12:37

Improved Changes

aa03510

stepsize Squared

4222bb6

rxwei reviewed Jun 28, 2019

View reviewed changes

Requasted Changes

b77cf5a

rxwei approved these changes Jun 28, 2019

View reviewed changes

lakshya-sky and others added 3 commits June 28, 2019 13:07

Fit to 100

1155b1a

Update Adadelta documentation.

7d8b253

Update optimizer naming.

16aaebe

Use "AdaGrad" and "ADADELTA" in documentation to stay faithful to paper spelling.

rxwei added the kokoro:run label Jun 28, 2019

kokoro-team removed the kokoro:run label Jun 28, 2019

rxwei merged commit a41903b into tensorflow:master Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Adadelta Optimizer #302

Add Adadelta Optimizer #302

lakshya-sky commented Jun 27, 2019

rxwei left a comment

rxwei Jun 27, 2019

rxwei Jun 27, 2019

rxwei Jun 27, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei Jun 28, 2019

rxwei commented Jun 28, 2019

Shashi456 commented Jun 28, 2019

	///decay factor, corresponding to fraction of gradient to keep at each time step.
	/// The decay factor, corresponding to fraction of gradient to keep at each time step.

	var stepSize = direction[keyPath: kp] * sqrt(updates[keyPath: kp] + epsilon)
	var stepSize = direction[keyPath: kp] * sqrt(updates[keyPath: kp] + epsilon)

	stepSize /= sqrt(accumulators[keyPath: kp] + epsilon)
	stepSize /= sqrt(accumulators[keyPath: kp] + epsilon)

	/// Accumulate Gradients
	/// The accumulated, exponentially decaying average of squared gradients.

	public var updates: Model.AllDifferentiableVariables
	public var accumulatedDelta: Model.AllDifferentiableVariables

	/// Accumulate Updates(here StepSizes)
	/// The accumulated parameter updates.

	let learningRate = self.learningRate * 1 / (1 + decay * Float(step))
	let learningRate = self.learningRate / (1 + decay * Float(step))

	(1 - rho) * (direction[keyPath: kp] * direction[keyPath: kp])
	(1 - rho) * (direction[keyPath: kp] * direction[keyPath: kp])

	sqrt(accumulatedDelta[keyPath: kp] + epsilon)
	sqrt(accumulatedDelta[keyPath: kp] + epsilon)

	(1 - Double(rho)) * (direction[keyPath: kp] * direction[keyPath: kp])
	(1 - Double(rho)) * (direction[keyPath: kp] * direction[keyPath: kp])

	sqrt(accumulatedDelta[keyPath: kp] + Double(epsilon))
	sqrt(accumulatedDelta[keyPath: kp] + Double(epsilon))

	// Update Float & Double Tensor variables.
	// Update `Tensor<Float>` and `Tensor<Double>` variables.

Add Adadelta Optimizer #302

Add Adadelta Optimizer #302

Conversation

lakshya-sky commented Jun 27, 2019

rxwei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxwei commented Jun 28, 2019

Shashi456 commented Jun 28, 2019