Skip to content

Commit 30d0491

Browse files
committed
equations ch04
1 parent 082347d commit 30d0491

File tree

2 files changed

+71
-0
lines changed

2 files changed

+71
-0
lines changed

docs/equations/pymle-equations.pdf

8.76 KB
Binary file not shown.

docs/equations/pymle-equations.tex

+71
Original file line numberDiff line numberDiff line change
@@ -755,6 +755,77 @@ \section{K-nearest neighbors -- a lazy learning algorithm}
755755
\section{Summary}
756756

757757

758+
759+
\chapter{Building Good Training Sets -- Data Pre-Processing}
760+
761+
\section{Dealing with missing data}
762+
\subsection{Eliminating samples or features with missing values}
763+
\subsection{Imputing missing values}
764+
\subsection{Understanding the scikit-learn estimator API}
765+
\section{Handling categorical data}
766+
\subsection{Mapping ordinal features}
767+
\subsection{Encoding class labels}
768+
\subsection{Performing one-hot encoding on nominal features}
769+
\section{Partitioning a dataset in training and test sets}
770+
\section{Bringing features onto the same scale}
771+
772+
Now, there are two common approaches to bringing different features onto the samescale: \textit{normalization} and \textit{standardization}. Those terms are often used quite looselyin different fields, and the meaning has to be derived from the context. Most often,\textit{normalization} refers to the rescaling of the features to a range of [0, 1], which is aspecial case of min-max scaling. To normalize our data, we can simply apply themin-max scaling to each feature column, where the new value $x_{norm}^{(i)}$ of a sample $x^{(i)}$:
773+
774+
\[
775+
x_{norm}^{(i)} = \frac{x^{(i)} - \mathbf{x}_{min}}{\mathbf{x}_{max} - \mathbf{x}_{min}}
776+
\]
777+
778+
Here, $x^{(i)}$ is a particular sample, $x_{min}$ is the smallest value in a feature column, and $x_{max}$ the largest value, respectively.
779+
780+
[...] Furthermore, standardization maintains useful information about outliers and makes the algorithm less sensitive to them in contrast to min-max scaling, which scalesthe data to a limited range of values.
781+
782+
The procedure of standardization can be expressed by the following equation:
783+
784+
\[
785+
x_{std}^{(i)} = \frac{x^{(i)} - \mu_{x}}{\sigma_{x}}
786+
\]
787+
788+
Here, $\mu_{x}$ is the sample mean of a particular feature column and $\sigma_{x}$ the corresponding standard deviation, respectively.
789+
790+
\section{Selecting meaningful features}
791+
\subsection{Sparse solutions with L1 regularization}
792+
793+
We recall from \textit{Chapter 3, A Tour of Machine Learning Classfiers Using Scikit-learn}, that L2 regularization is one approach to reduce the complexity of a model by penalizing large individual weights, where we de ned the L2 norm of our weight vector w as follows:
794+
795+
\[
796+
L2: \lVert \mathbf{w} \rVert^{2}_{2} = \sum_{j=1}^{m} w^{2}_{j}
797+
\]
798+
799+
Another approach to reduce the model complexity is the related \textit{L1 regularization}:
800+
801+
\[
802+
L1: \lVert \mathbf{w} \rVert_{1} = \sum_{j=1}^{m} |w_j|
803+
\]
804+
805+
806+
\subsection{Sequential feature selection algorithms}
807+
808+
Based on the preceding definition of SBS, we can outline the algorithm in 4 simple steps:
809+
810+
\begin{enumerate}
811+
\item Initialize the algorithm with $k=d$, where $d$ is the dimensionality of the full feature space $\mathbf{X}_d$
812+
\item Determine the feature $x^{-}$ that maximizes the criterion $x^{-} = \text{arg max} J(\mathbf{X_k} - x)$, where $x \in \mathbf{X}_k$.
813+
\item Remove the feature $x^-$ from the feature set: $\mathbf{X}_{k-l} := \mathbf{X}_k - x^{-}; \quad k:= k-1$.
814+
\item Terminate if $k$ equals the number of desired features, if not, got to step 2.
815+
816+
\end{enumerate}
817+
818+
819+
820+
821+
\section{Assessing feature importance with random forests}
822+
\section{Summary}
823+
824+
825+
826+
827+
828+
758829
\newpage
759830

760831
... to be continued ...

0 commit comments

Comments
 (0)