diff --git a/notebooks/05.04-Feature-Engineering.ipynb b/notebooks/05.04-Feature-Engineering.ipynb index c49feda41..c407d4525 100644 --- a/notebooks/05.04-Feature-Engineering.ipynb +++ b/notebooks/05.04-Feature-Engineering.ipynb @@ -27,7 +27,7 @@ "\n", "The previous sections outline the fundamental ideas of machine learning, but all of the examples assume that you have numerical data in a tidy, ``[n_samples, n_features]`` format.\n", "In the real world, data rarely comes in such a form.\n", - "With this in mind, one of the more important steps in using machine learning in practice is *feature engineering*: that is, taking whatever information you have about your problem and turning it in to numbers that you can use to build your feature matrix.\n", + "With this in mind, one of the more important steps in using machine learning in practice is *feature engineering*: that is, taking whatever information you have about your problem and turning it into numbers that you can use to build your feature matrix.\n", "\n", "In this section, we will cover a few common examples of feature engineering tasks: features for representing *categorical data*, features for representing *text*, and features for representing *images*.\n", "Additionally, we will discuss *derived features* for increasing model complexity and *imputation* of missing data.\n", @@ -83,7 +83,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "It turns out that this is not generally a useful approach in Scikit-Learn: the package's models make the fundamental assumption that numerical features reflect algebraic quanitites.\n", + "It turns out that this is not generally a useful approach in Scikit-Learn: the package's models make the fundamental assumption that numerical features reflect algebraic quantities.\n", "Thus such a mapping would imply, for example, that *Queen Anne < Fremont < Wallingford*, or even that *Wallingford - Queen Anne = Fremont*, which (niche demographic jokes aside) does not make much sense.\n", "\n", "In this case, one proven technique is to use *one-hot encoding*, which effectively creates extra columns indicating the presence or absence of a category with a value of 1 or 0, respectively.\n",