You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Introduction to arrays using numpy](#introduction-to-arrays-using-numpy)
29
30
-[visualize a dataset using seaborn](#visualize-a-dataset-using-seaborn)
30
31
-[manipulate dataset with pandas](#manipulate-dataset-with-pandas)
@@ -62,14 +63,11 @@ we will use the following libraries
62
63
63
64
## scikit-learn
64
65
65
-
### Overview
66
66
Scikit-Learn, also known as sklearn, is Python general-purpose machine learning library
67
-
Scikit-Learn is very versatile.
67
+
Scikit-Learn is very versatile.
68
68
69
-
### Requirements
70
69
sklearn requires python 3
71
70
72
-
### Installation
73
71
```
74
72
pip3 install sklearn
75
73
```
@@ -92,15 +90,12 @@ matplotlib is a python plotting library
92
90
93
91
## seaborn
94
92
95
-
### Overview
96
93
seaborn is a python data visualization library based on matplotlib
97
94
98
-
### Installation
99
95
```
100
96
pip3 install seaborn
101
97
```
102
98
103
-
104
99
# machine learning introduction
105
100
106
101
The file [machine_learning_101.pdf](machine_learning_101.pdf) helps peoples with no machine learning background to better understand machine learning basics
@@ -114,7 +109,8 @@ True machine learning use algorithms to build a model based on a training set in
114
109
115
110
## Supervised learning
116
111
117
-
The machine learning algorithm learns on a labeled dataset
112
+
The machine learning algorithm learns on a labeled dataset.
113
+
Learning by examples.
118
114
119
115
The iris dataset and titanic dataset are labeled dataset
120
116
@@ -244,7 +240,7 @@ Examples:
244
240
245
241
## LinearSVC
246
242
247
-
LinearSVC class from Scikit Learn library
243
+
LinearSVC is a python class from Scikit Learn library
248
244
```
249
245
>>> from sklearn.svm import LinearSVC
250
246
```
@@ -254,6 +250,7 @@ LinearSVC finds a linear separator. A line separating classes.
254
250
There are many linear separators: It will choose the optimal one, i.e the one that maximizes our confidence, i.e the one that maximizes the geometrical margin, i.e the one that maximizes the distance between itself and the closest/nearest data point point
255
251
Support vectors are the data points, which are closest to the line
256
252
253
+

257
254
258
255
## Support vector classifier
259
256
@@ -266,12 +263,7 @@ The class SVC is in the module svm of the Scikit Learn library
266
263
```
267
264
SVC with parameter kernel='linear' is similar to LinearSVC
268
265
269
-
The SVC classifier finds a linear separator. A line separating classes.
270
-
There are many linear separators: It will choose the optimal one, i.e the one that maximizes our confidence, i.e the one that maximizes the geometrical margin, i.e the one that maximizes the distance between itself and the closest/nearest data point point
271
-
Support vectors are the data points, which are closest to the line
272
-
273
-
SVC with parameter kernel='linear'
274
-
LinearSVC finds the linear separator that maximizes the distance between itself and the closest/nearest data point point
266
+
SVC with parameter kernel='linear' finds the linear separator that maximizes the distance between itself and the closest/nearest data point
275
267
276
268
277
269
## k-nearest neighbors
@@ -280,8 +272,10 @@ k-NN classification is used with a supervised learning set.
280
272
K is an integer.
281
273
To classify a new data point, this algorithm calculates the distance between the new data point and the other data points.
282
274
The distance can be Euclidean, Manhattan, ....
283
-
Once it knows the K closest neighbors of the new data point, it takes the most common class of these K closest neighbors, and assign that most common class to the new data point.
284
-
So the new data point is assigned to the most common class of its k nearest neighbors. It is assigned to the class to which the majority of its k nearest neighbors belong to.
275
+
Once the algorithm knows the K closest neighbors of the new data point, it takes the most common class of these K closest neighbors, and assign that most common class to the new data point.
276
+
So the new data point is assigned to the most common class of its k nearest neighbors. the new data point is assigned to the class to which the majority of its K nearest neighbors belong to.
277
+
278
+

285
279
286
280
## DBSCAN
287
281
@@ -306,7 +300,7 @@ We then pick a new arbitrary point and repeat the process.
306
300
Now, it's entirely possible that a point we pick has fewer than minPoints points in its epsilon range, and is also not a part of any other cluster: in that case, it's considered a "noise point" (outlier) not belonging to any cluster.
307
301
epsilon and minPoints remain the same while the algorithm is running.
308
302
309
-
![]()
303
+

310
304
311
305
![]()
312
306
@@ -315,14 +309,14 @@ epsilon and minPoints remain the same while the algorithm is running.
315
309
k-means clustering splits N data points into K groups (called clusters).
316
310
k ≤ n.
317
311
318
-
![]()
312
+

319
313
320
314
A cluster is a group of data points.
321
315
Each cluster has a center, called the centroid.
322
316
A cluster centroid is the mean of a cluster (average across all the data points in the cluster).
323
317
The radius of a cluster is the maximum distance between all the points and the centroid.
324
318
325
-
![]()
319
+

326
320
327
321
Distance between clusters = distance between centroids.
328
322
k-means clustering uses a basic iterative process.
0 commit comments