ManjunathGIT
diff --git a/‎docs/equations/pymle-equations.pdf
8.76 KB b/‎docs/equations/pymle-equations.pdf
8.76 KB
diff --git a/‎docs/equations/pymle-equations.tex
+71 b/‎docs/equations/pymle-equations.tex
+71
@@ -755,6 +755,77 @@ \section{K-nearest neighbors -- a lazy learning algorithm}
 \section{Summary}
 
 
+
+\chapter{Building Good Training Sets -- Data Pre-Processing}
+
+\section{Dealing with missing data}
+\subsection{Eliminating samples or features with missing values}
+\subsection{Imputing missing values}
+\subsection{Understanding the scikit-learn estimator API}
+\section{Handling categorical data}
+\subsection{Mapping ordinal features}
+\subsection{Encoding class labels}
+\subsection{Performing one-hot encoding on nominal features}
+\section{Partitioning a dataset in training and test sets}
+\section{Bringing features onto the same scale}
+
+Now, there are two common approaches to bringing different features onto the samescale: \textit{normalization} and \textit{standardization}. Those terms are often used quite looselyin different fields, and the meaning has to be derived from the context. Most often,\textit{normalization} refers to the rescaling of the features to a range of [0, 1], which is aspecial case of min-max scaling.  To normalize our data, we can simply apply themin-max scaling to each feature column, where the new value $x_{norm}^{(i)}$ of a sample  $x^{(i)}$:
+
+\[
+x_{norm}^{(i)} = \frac{x^{(i)} - \mathbf{x}_{min}}{\mathbf{x}_{max} - \mathbf{x}_{min}}
+\]
+
+Here, $x^{(i)}$ is a particular sample, $x_{min}$ is the smallest value in a feature column, and $x_{max}$ the largest value, respectively.
+
+[...]  Furthermore, standardization maintains useful information about outliers and makes the algorithm less sensitive to them in contrast to min-max scaling, which scalesthe data to a limited range of values.
+
+The procedure of standardization can be expressed by the following equation:
+
+\[
+x_{std}^{(i)} = \frac{x^{(i)} - \mu_{x}}{\sigma_{x}}
+\]
+
+Here, $\mu_{x}$ is the sample mean of a particular feature column and $\sigma_{x}$ the corresponding standard deviation, respectively.
+
+\section{Selecting meaningful features}
+\subsection{Sparse solutions with L1 regularization}
+
+We recall from \textit{Chapter 3, A Tour of Machine Learning Classfiers Using Scikit-learn}, that L2 regularization is one approach to reduce the complexity of a model by penalizing large individual weights, where we de ned the L2 norm of our weight vector w as follows:
+
+\[
+L2: \lVert \mathbf{w} \rVert^{2}_{2} = \sum_{j=1}^{m} w^{2}_{j}
+\]
+
+Another approach to reduce the model complexity is the related \textit{L1 regularization}:
+
+\[
+L1: \lVert \mathbf{w} \rVert_{1} = \sum_{j=1}^{m} |w_j|
+\]
+
+
+\subsection{Sequential feature selection algorithms}
+
+Based on the preceding definition of SBS, we can outline the algorithm in 4 simple steps:
+
+\begin{enumerate}
+\item Initialize the algorithm with $k=d$, where $d$  is the dimensionality of the full feature space $\mathbf{X}_d$
+\item Determine the feature $x^{-}$ that maximizes the criterion $x^{-} = \text{arg max} J(\mathbf{X_k} - x)$, where  $x \in \mathbf{X}_k$.
+\item Remove the feature $x^-$ from the feature set: $\mathbf{X}_{k-l} := \mathbf{X}_k - x^{-}; \quad k:= k-1$.
+\item Terminate if $k$ equals the number of desired features, if not, got to step 2.
+
+\end{enumerate}
+
+
+
+
+\section{Assessing feature importance with random forests}
+\section{Summary}
+
+
+
+
+
+
 \newpage
 
 ... to be continued ...