Skip to content

Commit 384384d

Browse files
authored
Merge pull request rasbt#20 from NeerajSarwan/patch-1
changed model scoring value from 0.98 to 0.95
2 parents b8fd12e + 1bd4cbd commit 384384d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

faq/model-selection-in-datascience.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ meeting the project deadline
66
high predictive performance
77
high computational efficiency
88
good interpretability
9-
These aspects differ from project to project, and it is really important to be clear about what you are trying to achieve beforehand. For example, let's say you managed to come up with a model scoring a 0.98 on your favorite performance metric scale from 0-1. Is it worth spending a couple of more days, weeks, months, or years to squeeze out another 0.05 improvement? It depends on your date of delivery. It depends on the available computing hardware. Eventually, it may also be important to know what's going on under the hood (would your collaborators be happy with another one of these multi-layer ensemble XGBoost frankensteins?)
9+
These aspects differ from project to project, and it is really important to be clear about what you are trying to achieve beforehand. For example, let's say you managed to come up with a model scoring a 0.95 on your favorite performance metric scale from 0-1. Is it worth spending a couple of more days, weeks, months, or years to squeeze out another 0.05 improvement? It depends on your date of delivery. It depends on the available computing hardware. Eventually, it may also be important to know what's going on under the hood (would your collaborators be happy with another one of these multi-layer ensemble XGBoost frankensteins?)
1010

1111
When it comes to choosing between particular algorithms, I'd typically approach a new problem starting with a very simple hypothesis space -- for example, simple logistic regression or softmax regression. (Of course, this comes all after exploring and getting familiar with the dataset.) I'd use my initial model as a benchmark and try another bunch of simple classifiers with piece-wise or non-linear hypothesis spaces like decision trees, random forests, and (Rbf kernel) SVMs. If these don't cut it, I'd explore further options including MLPs, RNNs, and ConvNets if appropriate.
1212

0 commit comments

Comments
 (0)