Skip to content

Commit 1ad8bf4

Browse files
authored
Update README.md
1 parent e202824 commit 1ad8bf4

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
- [seaborn](#seaborn)
1010
- [machine learning introduction](#machine-learning-introduction)
1111
- [Supervised learning](#supervised-learning)
12+
- [labeled dataset examples](#labeled-dataset-examples)
1213
- [Unsupervised learning](#unsupervised-learning)
1314
- [Clustering](#clustering)
1415
- [Classification](#classification)
@@ -112,6 +113,8 @@ True machine learning use algorithms to build a model based on a training set in
112113
The machine learning algorithm learns on a labeled dataset.
113114
Learning by examples.
114115

116+
## labeled dataset examples
117+
115118
The iris dataset and titanic dataset are labeled dataset
116119

117120
The iris dataset contains a set of 150 records under five attributes: petal length, petal width, sepal length, sepal width and species.
@@ -151,6 +154,8 @@ Classification uses supervised learning.
151154
The machine learning algorithm learns on a labeled dataset
152155
We know the labels from the training set
153156

157+
KNN (k-nearest neighbors) and Support vector classifier (SVC) are supervised learning algorithms for classification.
158+
154159
## machine learning model
155160

156161
Once a machine learning model is built with a training set, it can be used to process new data points to make predictions or decisions
@@ -304,8 +309,6 @@ epsilon and minPoints remain the same while the algorithm is running.
304309

305310
![DBSCAN.png](resources/DBSCAN.png)
306311

307-
![DBSCAN_mouse.png](resources/DBSCAN_mouse.png)
308-
309312
## k-means clustering
310313

311314
k-means clustering splits N data points into K groups (called clusters).
@@ -332,8 +335,9 @@ The objective is to minimize the variance within each cluster.
332335
Clusters are well separated from each other.
333336
It maximizes the average inter-cluster distance.
334337

338+
k-means clusters tend to be of the same size. size refers to the area. size doesnt refer to the number od elements. Two clusters of the same area do not have to have the same number of elements (except if your data set has the same density)
335339
![kmeans_mouse.png](resources/kmeans_mouse.png)
336-
340+
The tendency of k-means to produce equal-sized clusters leads to bad results here
337341

338342
# Introduction to arrays using numpy
339343

0 commit comments

Comments
 (0)