ManjunathGIT
diff --git a/‎README.md
+2-1 b/‎README.md
+2-1
diff --git a/‎faq/README.md
+1 b/‎faq/README.md
+1
diff --git a/‎faq/visual-backpropagation.md
+27 b/‎faq/visual-backpropagation.md
+27
diff --git a/‎faq/visual-backpropagation/backpropagation.png
55 KB b/‎faq/visual-backpropagation/backpropagation.png
55 KB
diff --git a/‎faq/visual-backpropagation/chain_rule_1.png
5.53 KB b/‎faq/visual-backpropagation/chain_rule_1.png
5.53 KB
diff --git a/‎faq/visual-backpropagation/chain_rule_2.png
6.1 KB b/‎faq/visual-backpropagation/chain_rule_2.png
6.1 KB
diff --git a/‎faq/visual-backpropagation/forward-propagation.png
44 KB b/‎faq/visual-backpropagation/forward-propagation.png
44 KB
diff --git a/‎faq/visual-backpropagation/nonconvex-cost.png
31.9 KB b/‎faq/visual-backpropagation/nonconvex-cost.png
31.9 KB
@@ -77,14 +77,15 @@ Excerpts from the [Foreword](./docs/foreword_ro.pdf) and [Preface](./docs/prefac
 
 ### General Questions
 
-- [Why are you and other people sometimes implement machine learning algorithms from scratch?](./faq/implementing-from-scratch.md)]
+- [Why are you and other people sometimes implement machine learning algorithms from scratch?](./faq/implementing-from-scratch.md)
 - [What learning path/discipline in data science I should focus on?](./faq/data-science-career.md)
 - [At what point should one start contributing to open source?](./open-source.md)
 - [How important do you think having a mentor is to the learning process?](./mentor.md)
 - [Where are the best online communities centered around data science/machine learning or python?](./ml-python-communities.md)
 
 ### Questions about ML Concepts
 
+- [Can you give a visual explanation for the back propagation algorithm for neural networks?](./faq/visual-backpropagation.md)
 - [Why do we re-use parameters from the training set to standardize the test set and new data?](./faq/standardize-param-reuse.md)
 - [What are some of the issues with clustering?](./faq/issues-with-clustering.md)
 - [What is the difference between deep learning and usual machine learning?](./faq/difference-deep-and-normal-learning.md)
 
@@ -28,6 +28,7 @@ Sebastian
 
 ### Questions about ML Concepts
 
+- [Can you give a visual explanation for the back propagation algorithm for neural networks?](./visual-backpropagation.md)
 - [Why do we re-use parameters from the training set to standardize the test set and new data?](./standardize-param-reuse.md)
 - [What are some of the issues with clustering?](./issues-with-clustering.md)
 - [What is the difference between deep learning and usual machine learning?](./difference-deep-and-normal-learning.md)
 
@@ -0,0 +1,27 @@
+# Can you give a visual explanation for the back propagation algorithm for neural networks?
+
+Let's assume we are really into mountain climbing, and to add a little extra challenge, we cover eyes this time so that we can't see where we are and when we accomplished our "objective," that is, reaching the top of the mountain.
+
+Since we can't see the path upfront, we let our intuition guide us: assuming that the mountain top is the "highest" point of the mountain, we think that the steepest path leads us to the top most efficiently.   
+We approach this challenge by iteratively "feeling" around you and taking a step into the direction of the steepest ascent -- let's call it "gradient ascent." But what do we do if we reach a point where we can't ascent any further? I.e., each direction leads downwards? At this point, we may have already reached the mountain's top, but we could just have reached a smaller plateau ... we don't know.
+Essentially, this is just an analogy of gradient ascent optimization (basically the counterpart of minimizing a cost function via gradient descent). However, this is not specific to backpropagation but just one way to minimize a convex cost function (if there is only a global minima) or non-convex cost function (which has local minima like the "plateaus" that let us think we reached the mountain's top). Using a little visual aid, we could picture a non-convex cost function with only one parameter (where the blue ball is our current location) as follows:  
+
+ ![](./visual-backpropagation/nonconvex-cost.png)
+
+Now, backpropagation is just back-propagating the cost over multiple "levels" (or layers). E.g., if we have a multi-layer perceptron, we can picture forward propagation (passing the input signal through a network while multiplying it by the respective weights to compute an output) as follows:
+
+![](./visual-backpropagation/forward-propagation.png)
+
+And in backpropagation, we "simply" backpropagate the error (the "cost" that we compute by comparing the calculated output and the known, correct target output, which we then use to update the model parameters):
+
+![](./visual-backpropagation/backpropagation.png)
+
+
+It may be some time ago since pre-calc, but it's essentially all based on the simple chain-rule that we use for nested functions
+
+![](./visual-backpropagation/chain_rule_1.png)
+
+![](./visual-backpropagation/chain_rule_2.png)
+
+
+Instead of doing this "manually" we can use computational tools (called "automatic differentiation"), and backpropagation is basically the "reverse" mode of this auto-differentiation. Why reverse and not forward? Because it is computationally cheaper! If we'd do it forward-wise, we'd successively multiply large matrices for each layer until we multiply a large matrix by a vector in the output layer. However, if we start backwards, that is, we start by multiplying a matrix by a vector, we get another vector, and so forth. So, I'd say the beauty in backpropagation is that we are doing more efficient matrix-vector multiplications instead of matrix-matrix multiplications.