Skip to content

Commit 4ef8190

Browse files
committed
best ml algoo
1 parent ec56587 commit 4ef8190

File tree

4 files changed

+26
-2
lines changed

4 files changed

+26
-2
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ Excerpts from the [Foreword](./docs/foreword_ro.pdf) and [Preface](./docs/prefac
105105
- [What are the different dimensionality reduction methods in machine learning?](./faq/dimensionality-reduction.md)
106106
- [What is Euclidean distance in terms of machine learning?](./faq/euclidean-distance.md)
107107
- [What is the major difference between naive Bayes and logistic regression?](./faq/naive-bayes-vs-logistic-regression.md)
108+
- [Which machine learning algorithms can be considered as among the best?](./faq/best-ml-algo.md)
108109

109110
### Questions about the Book
110111

docs/feedback.md

+7-2
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,7 @@ your book is the best for me to use as a textbook.
8181

8282
> An amazing book, really can't praise it enough. Presentation of technical subjects are explained very clearly. The author is very thorough in his writing making sure to fill in the details so you don't get left behind. It was a wonderful experience. Clearly written examples showing the theory and practice of machine learning. An invaluable tutorial.
8383
As a bonus, chapter on "Parallelizing Neural Network Training with Theano" was a great addition! A very exciting topic, author really went all out. Hands down the best Python Machine learning book available.
84-
85-
NazHuz via [Amazon](http://www.amazon.com/gp/customer-reviews/RURYHN1G3SRMZ/ref=cm_cr_pr_rvw_ttl?ie=UTF8&ASIN=1783555130)
84+
-- NazHuz via [Amazon](http://www.amazon.com/gp/customer-reviews/RURYHN1G3SRMZ/ref=cm_cr_pr_rvw_ttl?ie=UTF8&ASIN=1783555130)
8685

8786

8887
<hr>
@@ -91,3 +90,9 @@ I am reading your recent book on ML and I think this is on the best book I ever
9190
-- Michaël Lambé
9291

9392
[![](./images/armand_g_tweet.png)](https://twitter.com/arm_gilles/status/658927560932401154)
93+
94+
95+
<hr>
96+
97+
This is the first Python Machine Learning book that actually made sense. Sebastian managed to combine theory (math behind the models, how to implement an algorithm from scratch) with practice (how to actually implement it using scikit-learn) in a way no book has done thus far. I was really impressed (and surprised) to see how 'easy' it is to implement the more simple algorithms from scratch [...]
98+
-- C. Herther via [Amazon](http://www.amazon.com/gp/customer-reviews/R2I0D8HNIQODVA/ref=cm_cr_pr_rvw_ttl?ie=UTF8&ASIN=B00YSILNL0)

faq/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ Sebastian
4848
- [What are the different dimensionality reduction methods in machine learning?](./dimensionality-reduction.md)
4949
- [What is Euclidean distance in terms of machine learning?](./euclidean-distance.md)
5050
- [What is the major difference between naive Bayes and logistic regression?](./naive-bayes-vs-logistic-regression.md)
51+
- [Which machine learning algorithms can be considered as among the best?](./best-ml-algo.md)
5152

5253
### Questions about the Book
5354

faq/best-ml-algo.md

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Which machine learning algorithms can be considered as among the best?
2+
3+
I recommend taking a look at
4+
5+
Wolpert, D.H., Macready, W.G. (1997), "[No Free Lunch Theorems for Optimization](http://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf)", IEEE Transactions on Evolutionary Computation 1, 67.
6+
7+
Unfortunately, there's no real answer to this question: different datasets, questions, and assumptions require different algorithms -- or in other words: we haven't found the Master Algorithm, yet.
8+
9+
But let me write down thoughts about different classifiers at least:
10+
11+
- both logistic regression and SVMs work great for linear problems, logistic regression may be preferable for very noisy data
12+
- naive Bayes may work better than logistic regression for small training set sizes; the former is also pretty fast, e.g., if you have a large multi-class problem, you'd only have to train one classifier whereas you'd have to use One-vs-Rest or One-vs-One with in SVMs or logistic regression (alternatively, you could implement multinomial/softmax regression though); another point is that you don't have to worry so much about hyperparameter optimization -- if you are estimating the class priors from the training set, there are actually no hyperparameters
13+
- kernel SVM/logistic regression is preferable for nonlinear data vs. the linear models
14+
- k-nearest neighbor can also work quite well in practice for datasets with large number of samples and relatively low dimensionality
15+
- Random Forests & Extremely Randomized trees are very robust and work well across a whole range of problems -- linear and/or nonlinear problems
16+
17+
Personally, I tend to prefer a multi-layer neural network in most cases, given that my dataset is sufficiently large. In my experience, the generalization performance was almost always superior to one of the other approaches I listed above. But again, it really depends on the given dataset.

0 commit comments

Comments
 (0)