Skip to content

Commit 6f79f6d

Browse files
authored
Update README.md
1 parent 62e01b1 commit 6f79f6d

File tree

1 file changed

+19
-25
lines changed

1 file changed

+19
-25
lines changed

README.md

Lines changed: 19 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,10 @@
33
- [What to find in this repository](#what-to-find-in-this-repository)
44
- [Python libraries](#python-libraries)
55
- [scikit-learn](#scikit-learn)
6-
- [Overview](#overview)
7-
- [Requirements](#requirements)
8-
- [Installation](#installation)
96
- [numpy](#numpy)
107
- [Pandas](#pandas)
118
- [Matplotlib](#matplotlib)
129
- [seaborn](#seaborn)
13-
- [Overview](#overview-1)
14-
- [Installation](#installation-1)
1510
- [machine learning introduction](#machine-learning-introduction)
1611
- [Supervised learning](#supervised-learning)
1712
- [Unsupervised learning](#unsupervised-learning)
@@ -25,6 +20,12 @@
2520
- [How to Detect Overfitting](#how-to-detect-overfitting)
2621
- [k-Fold Cross-Validation and overfitting](#k-fold-cross-validation-and-overfitting)
2722
- [How to Prevent Overfitting](#how-to-prevent-overfitting)
23+
- [machine learning algorithms](#machine-learning-algorithms)
24+
- [LinearSVC](#linearsvc)
25+
- [Support vector classifier](#support-vector-classifier)
26+
- [k-nearest neighbors](#k-nearest-neighbors)
27+
- [DBSCAN](#dbscan)
28+
- [k-means clustering](#k-means-clustering)
2829
- [Introduction to arrays using numpy](#introduction-to-arrays-using-numpy)
2930
- [visualize a dataset using seaborn](#visualize-a-dataset-using-seaborn)
3031
- [manipulate dataset with pandas](#manipulate-dataset-with-pandas)
@@ -62,14 +63,11 @@ we will use the following libraries
6263

6364
## scikit-learn
6465

65-
### Overview
6666
Scikit-Learn, also known as sklearn, is Python general-purpose machine learning library
67-
Scikit-Learn is very versatile.
67+
Scikit-Learn is very versatile.
6868

69-
### Requirements
7069
sklearn requires python 3
7170

72-
### Installation
7371
```
7472
pip3 install sklearn
7573
```
@@ -92,15 +90,12 @@ matplotlib is a python plotting library
9290

9391
## seaborn
9492

95-
### Overview
9693
seaborn is a python data visualization library based on matplotlib
9794

98-
### Installation
9995
```
10096
pip3 install seaborn
10197
```
10298

103-
10499
# machine learning introduction
105100

106101
The file [machine_learning_101.pdf](machine_learning_101.pdf) helps peoples with no machine learning background to better understand machine learning basics
@@ -114,7 +109,8 @@ True machine learning use algorithms to build a model based on a training set in
114109

115110
## Supervised learning
116111

117-
The machine learning algorithm learns on a labeled dataset
112+
The machine learning algorithm learns on a labeled dataset.
113+
Learning by examples.
118114

119115
The iris dataset and titanic dataset are labeled dataset
120116

@@ -244,7 +240,7 @@ Examples:
244240

245241
## LinearSVC
246242

247-
LinearSVC class from Scikit Learn library
243+
LinearSVC is a python class from Scikit Learn library
248244
```
249245
>>> from sklearn.svm import LinearSVC
250246
```
@@ -254,6 +250,7 @@ LinearSVC finds a linear separator. A line separating classes.
254250
There are many linear separators: It will choose the optimal one, i.e the one that maximizes our confidence, i.e the one that maximizes the geometrical margin, i.e the one that maximizes the distance between itself and the closest/nearest data point point
255251
Support vectors are the data points, which are closest to the line
256252

253+
![SVM.png](resources/SVM.png)
257254

258255
## Support vector classifier
259256

@@ -266,12 +263,7 @@ The class SVC is in the module svm of the Scikit Learn library
266263
```
267264
SVC with parameter kernel='linear' is similar to LinearSVC
268265

269-
The SVC classifier finds a linear separator. A line separating classes.
270-
There are many linear separators: It will choose the optimal one, i.e the one that maximizes our confidence, i.e the one that maximizes the geometrical margin, i.e the one that maximizes the distance between itself and the closest/nearest data point point
271-
Support vectors are the data points, which are closest to the line
272-
273-
SVC with parameter kernel='linear'
274-
LinearSVC finds the linear separator that maximizes the distance between itself and the closest/nearest data point point
266+
SVC with parameter kernel='linear' finds the linear separator that maximizes the distance between itself and the closest/nearest data point
275267

276268

277269
## k-nearest neighbors
@@ -280,8 +272,10 @@ k-NN classification is used with a supervised learning set.
280272
K is an integer.
281273
To classify a new data point, this algorithm calculates the distance between the new data point and the other data points.
282274
The distance can be Euclidean, Manhattan, ....
283-
Once it knows the K closest neighbors of the new data point, it takes the most common class of these K closest neighbors, and assign that most common class to the new data point.
284-
So the new data point is assigned to the most common class of its k nearest neighbors. It is assigned to the class to which the majority of its k nearest neighbors belong to.
275+
Once the algorithm knows the K closest neighbors of the new data point, it takes the most common class of these K closest neighbors, and assign that most common class to the new data point.
276+
So the new data point is assigned to the most common class of its k nearest neighbors. the new data point is assigned to the class to which the majority of its K nearest neighbors belong to.
277+
278+
![KNN.png](resources/KNN.png)
285279

286280
## DBSCAN
287281

@@ -306,7 +300,7 @@ We then pick a new arbitrary point and repeat the process.
306300
Now, it's entirely possible that a point we pick has fewer than minPoints points in its epsilon range, and is also not a part of any other cluster: in that case, it's considered a "noise point" (outlier) not belonging to any cluster.
307301
epsilon and minPoints remain the same while the algorithm is running.
308302

309-
![]()
303+
![DBSCAN.png](resources/DBSCAN.png)
310304

311305
![]()
312306

@@ -315,14 +309,14 @@ epsilon and minPoints remain the same while the algorithm is running.
315309
k-means clustering splits N data points into K groups (called clusters).
316310
k ≤ n.
317311

318-
![]()
312+
![kmeans.png](resources/kmeans.png)
319313

320314
A cluster is a group of data points.
321315
Each cluster has a center, called the centroid.
322316
A cluster centroid is the mean of a cluster (average across all the data points in the cluster).
323317
The radius of a cluster is the maximum distance between all the points and the centroid.
324318

325-
![]()
319+
![radius.png](resources/radius.png)
326320

327321
Distance between clusters = distance between centroids.
328322
k-means clustering uses a basic iterative process.

0 commit comments

Comments
 (0)