best ml algoo

rasbt · rasbt · commit 4ef81903ca8a · 2015-11-08T23:15:18.000-05:00
diff --git a/README.md b/README.md
@@ -105,6 +105,7 @@ Excerpts from the [Foreword](./docs/foreword_ro.pdf) and [Preface](./docs/prefac
 - [What are the different dimensionality reduction methods in machine learning?](./faq/dimensionality-reduction.md)
 - [What is Euclidean distance in terms of machine learning?](./faq/euclidean-distance.md)
 - [What is the major difference between naive Bayes and logistic regression?](./faq/naive-bayes-vs-logistic-regression.md)
+- [Which machine learning algorithms can be considered as among the best?](./faq/best-ml-algo.md)
 
 ### Questions about the Book
 
diff --git a/docs/feedback.md b/docs/feedback.md
@@ -81,8 +81,7 @@ your book is the best for me to use as a textbook.
 
 > An amazing book, really can't praise it enough. Presentation of technical subjects are explained very clearly. The author is very thorough in his writing making sure to fill in the details so you don't get left behind. It was a wonderful experience. Clearly written examples showing the theory and practice of machine learning. An invaluable tutorial.  
 As a bonus, chapter on "Parallelizing Neural Network Training with Theano" was a great addition! A very exciting topic, author really went all out. Hands down the best Python Machine learning book available.  
-
-NazHuz via [Amazon](http://www.amazon.com/gp/customer-reviews/RURYHN1G3SRMZ/ref=cm_cr_pr_rvw_ttl?ie=UTF8&ASIN=1783555130)
+-- NazHuz via [Amazon](http://www.amazon.com/gp/customer-reviews/RURYHN1G3SRMZ/ref=cm_cr_pr_rvw_ttl?ie=UTF8&ASIN=1783555130)
 
 
 <hr>
@@ -91,3 +90,9 @@ I am reading your recent book on ML and I think this is on the best book I ever
 -- Michaël Lambé
 
 [![](./images/armand_g_tweet.png)](https://twitter.com/arm_gilles/status/658927560932401154)
+
+
+<hr>
+
+This is the first Python Machine Learning book that actually made sense. Sebastian managed to combine theory (math behind the models, how to implement an algorithm from scratch) with practice (how to actually implement it using scikit-learn) in a way no book has done thus far. I was really impressed (and surprised) to see how 'easy' it is to implement the more simple algorithms from scratch [...]  
+-- C. Herther via [Amazon](http://www.amazon.com/gp/customer-reviews/R2I0D8HNIQODVA/ref=cm_cr_pr_rvw_ttl?ie=UTF8&ASIN=B00YSILNL0)
diff --git a/faq/README.md b/faq/README.md
@@ -48,6 +48,7 @@ Sebastian
 - [What are the different dimensionality reduction methods in machine learning?](./dimensionality-reduction.md)
 - [What is Euclidean distance in terms of machine learning?](./euclidean-distance.md)
 - [What is the major difference between naive Bayes and logistic regression?](./naive-bayes-vs-logistic-regression.md)
+- [Which machine learning algorithms can be considered as among the best?](./best-ml-algo.md)
 
 ### Questions about the Book
 
diff --git a/faq/best-ml-algo.md b/faq/best-ml-algo.md
@@ -0,0 +1,17 @@
+# Which machine learning algorithms can be considered as among the best?
+
+I recommend taking a look at
+
+Wolpert, D.H., Macready, W.G. (1997), "[No Free Lunch Theorems for Optimization](http://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf)", IEEE Transactions on Evolutionary Computation 1, 67.
+
+Unfortunately, there's no real answer to this question: different datasets, questions, and assumptions require different algorithms -- or in other words: we haven't found the Master Algorithm, yet.
+
+But let me write down thoughts about different classifiers at least:
+
+- both logistic regression and SVMs work great for linear problems, logistic regression may be preferable for very noisy data
+- naive Bayes may work better than logistic regression for small training set sizes; the former is also pretty fast, e.g., if you have a large multi-class problem, you'd only have to train one classifier whereas you'd have to use One-vs-Rest or One-vs-One with in SVMs or logistic regression (alternatively, you could implement multinomial/softmax regression though); another point is that you don't have to worry so much about hyperparameter optimization -- if you are estimating the class priors from the training set, there are actually no hyperparameters
+- kernel SVM/logistic regression is preferable for nonlinear data vs. the linear models
+- k-nearest neighbor can also work quite well in practice for datasets with large number of samples and relatively low dimensionality
+- Random Forests & Extremely Randomized trees are very robust and work well across a whole range of problems -- linear and/or nonlinear problems
+
+Personally, I tend to prefer a multi-layer neural network in most cases, given that my dataset is sufficiently large. In my experience, the generalization performance was almost always superior to one of the other approaches I listed above. But again, it really depends on the given dataset.