-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy patharticle-41070.htm
205 lines (193 loc) · 19 KB
/
article-41070.htm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
<!DOCTYPE html>
<html xml:lang="zh-CN" lang="zh-CN">
<head>
<link rel="canonical" href="https://windowsv2ray.github.io/news/article-41070.htm" />
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>机器学习笔记:Python底层实现KNN</title>
<meta name="description" content="小白纯新手,自学了sklearn,想要挑战用python手写机器学习中的主要算法。 借助python自带的pandas库导入数据,很简单。用的数据是下载到本地的红酒集。 代码如下(示例): imp" />
<link rel="icon" href="/assets/website/img/windowsv2ray/favicon.ico" type="image/x-icon"/>
<meta name="author" content="Windows V2ray分享订阅站">
<meta property="og:type" content="article" />
<meta property="og:url" content="https://windowsv2ray.github.io/news/article-41070.htm" />
<meta property="og:site_name" content="Windows V2ray分享订阅站" />
<meta property="og:title" content="机器学习笔记:Python底层实现KNN" />
<meta property="og:image" content="https://windowsv2ray.github.io/uploads/20240604/5b87a6f5ce71e7182137e356b5bf19d6.webp" />
<meta property="og:release_date" content="2025-01-15T09:50:45" />
<meta property="og:updated_time" content="2025-01-15T09:50:45" />
<meta property="og:description" content="小白纯新手,自学了sklearn,想要挑战用python手写机器学习中的主要算法。 借助python自带的pandas库导入数据,很简单。用的数据是下载到本地的红酒集。 代码如下(示例): imp" />
<meta name="applicable-device" content="pc,mobile" />
<meta name="renderer" content="webkit" />
<meta name="force-rendering" content="webkit" />
<meta http-equiv="Cache-Control" content="no-transform" />
<meta name="robots" content="max-image-preview:large" />
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="apple-mobile-web-app-title" content="机器学习笔记:Python底层实现KNN">
<meta name="format-detection" content="telephone=no">
<link rel="dns-prefetch" href="https:/www.googletagmanager.com">
<link rel="dns-prefetch" href="https://www.googleadservices.com">
<link rel="dns-prefetch" href="https://www.google-analytics.com">
<link rel="dns-prefetch" href="https://pagead2.googlesyndication.com">
<link rel="dns-prefetch" href="https://cm.g.doubleclick.net">
<link rel="stylesheet" href="/assets/website/js/frontend/windowsv2ray/animate/animate.css">
<link rel="stylesheet" href="/assets/website/css/windowsv2ray/bootstrap.css">
<link rel="stylesheet" href="/assets/website/css/windowsv2ray/maicons.css">
<link rel="stylesheet" href="/assets/website/js/frontend/windowsv2ray/owl-carousel/css/owl.carousel.css">
<link rel="stylesheet" href="/assets/website/css/windowsv2ray/theme.css">
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-JN82W0GJX5"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-JN82W0GJX5');
</script>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3332997411212854"
crossorigin="anonymous"></script>
</head>
<body data-page="detail">
<!-- Back to top button -->
<div class="back-to-top"></div>
<header>
<nav class="navbar navbar-expand-lg navbar-light navbar-float">
<div class="container">
<a href="/" class="navbar-brand">
<span>Windows V2ray</span>
</a>
<button class="navbar-toggler" data-toggle="collapse" data-target="#navbarContent" aria-controls="navbarContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="navbar-collapse collapse" id="navbarContent">
<ul class="navbar-nav ml-lg-4 pt-3 pt-lg-0">
<li class="nav-item">
<a href="/" class="nav-link">首页</a>
</li>
<li class="nav-item">
<a href="/free-nodes/" class="nav-link">免费节点</a>
</li>
<li class="nav-item">
<a href="/paid-subscribe/" class="nav-link">推荐机场</a>
</li>
<li class="nav-item">
<a href="/client.htm" class="nav-link">客户端</a>
</li>
<li class="nav-item">
<a href="/news/" class="nav-link">新闻资讯</a>
</li>
</ul>
</div>
</div>
</nav>
<div class="container mt-5">
<div class="page-banner">
<div class="row justify-content-center align-items-center h-100">
<div class="col-md-10">
<h1 class="text-center">机器学习笔记:Python底层实现KNN</h1>
<nav aria-label="Breadcrumb">
<ul class="breadcrumb justify-content-center py-0 bg-transparent">
<li class="breadcrumb-item"><a href="/">首页</a></li>
<li class="breadcrumb-item"><a href="/news/">新闻资讯</a></li>
<li class="breadcrumb-item active">正文</li>
</ul>
</nav>
</div>
</div>
</div>
</div>
</header>
<main>
<div class="page-section">
<div class="container">
<div class="row">
<div class="col-md-9">
<input type="hidden" id="share-website-info" data-name="" data-url="">
<div id="content_views" class="htmledit_views"> </h1> <p><span style="color:#999aaa;">小白纯新手,自学了sklearn,想要挑战用python手写机器学习中的主要算法。</span></p> <hr/> </h1> <p>借助python自带的pandas库导入数据,很简单。用的数据是下载到本地的红酒集。</p> <p><span style="color:#999aaa;">代码如下(示例):</span></p> <pre><code class="language-c language-python">import pandas as pd def read_xlsx(csv_path): data = pd.read_csv(csv_path) print(data) return data</code></pre> <h2 id="2.%E8%AF%BB%E5%85%A5%E6%95%B0%E6%8D%AE"><a id="2_49" rel="nofollow"></a>2.归一化</h2> <p>KNN算法中将用到距离,因此归一化是一个重要步骤,可以消除数据的量纲。我用了归一化,消除量纲也可以用标准化,但是作为新手,我觉得归一化比较简单。</p> <p>其中最大最小值的计算用到了python中的numpy库,pandas导入的数据是DateFrame形式的,np.array()用来将DateFrame形式转化为可以用numpy计算的ndarray形式。</p> <p><span style="color:#999aaa;">代码如下(示例):</span></p> <pre><code class="language-c language-python">import numpy as np def MinMaxScaler(data): col = data.shape[1] for i in range(0, col-1): arr = data.iloc[:, i] arr = np.array(arr) #将DataFrame形式转化为ndarray形式,方便后续用numpy计算 min = np.min(arr) max = np.max(arr) arr = (arr-min)/(max-min) data.iloc[:, i] = arr return data</code></pre> <h2>3.分训练集和测试集</h2> <p>先将数据值和标签值分别用x和y划分开,设置随机数种子random_state,若不设置,则每次运行的结果会不相同。test_size表示测试集比例。</p> <pre><code class="language-python">def train_test_split(data, test_size=0.2, random_state=None): col = data.shape[1] x = data.iloc[:, 0:col-1] y = data.iloc[:, -1] x = np.array(x) y = np.array(y) # 设置随机种子,当随机种子非空时,将锁定随机数 if random_state: np.random.seed(random_state) # 将样本集的索引值进行随机打乱 # permutation随机生成0-len(data)随机序列 shuffle_indexs = np.random.permutation(len(x)) # 提取位于样本集中20%的那个索引值 test_size = int(len(x) * test_size) # 将随机打乱的20%的索引值赋值给测试索引 test_indexs = shuffle_indexs[:test_size] # 将随机打乱的80%的索引值赋值给训练索引 train_indexs = shuffle_indexs[test_size:] # 根据索引提取训练集和测试集 x_train = x[train_indexs] y_train = y[train_indexs] x_test = x[test_indexs] y_test = y[test_indexs] # 将切分好的数据集返回出去 # print(y_train) return x_train, x_test, y_train, y_test</code></pre> <h2>4.计算距离</h2> <p>此处用到欧氏距离,pow()函数用来计算幂次方。length指属性值数量,在计算最近邻时用到。</p> <pre><code class="language-python">def CountDistance(train,test,length): distance = 0 for x in range(length): distance += pow(test[x] - train[x], 2)**0.5 return distance</code></pre> <h2>5.选择最近邻</h2> <p>计算测试集中的一条数据和训练集中的每一条数据的距离,选择距离最近的k个,以少数服从多数原则得出标签值。其中argsort返回的是数值从小到大的索引值,为了找到对应的标签值。</p> <p>tip:用numpy计算众数的方法</p> <pre><code class="language-html">import numpy as np #bincount():统计非负整数的个数,不能统计浮点数 counts = np.bincount(nums) #返回众数 np.argmax(counts)</code></pre> <p>少数服从多数原则,计算众数,返回标签值。</p> <pre><code class="language-python">def getNeighbor(x_train,test,y_train,k): distance = [] #测试集的维度 length = x_train.shape[1] #测试集合所有训练集的距离 for x in range(x_train.shape[0]): dist = CountDistance(test, x_train[x], length) distance.append(dist) distance = np.array(distance) #排序 distanceSort = distance.argsort() # distance.sort(key= operator.itemgetter(1)) # print(len(distance)) # print(distanceSort[0]) neighbors =[] for x in range(k): labels = y_train[distanceSort[x]] neighbors.append(labels) # print(labels) counts = np.bincount(neighbors) label = np.argmax(counts) # print(label) return label</code></pre> <p>调用函数时:</p> <pre><code class="language-python">getNeighbor(x_train,x_test[0],y_train,3)</code></pre> <h2> 6.计算准确率</h2> <p>用以上KNN算法预测测试集中每一条数据的标签值,存入result数组,将预测结果与真实值比较,计算预测正确的个数与总体个数的比值,即为准确率。</p> <pre><code class="language-python">def getAccuracy(x_test,x_train,y_train,y_test): result = [] k = 3 # arr_label = getNeighbor(x_train, x_test[0], y_train, k) for x in range(len(x_test)): arr_label = getNeighbor(x_train, x_test[x], y_train, k) result.append(arr_label) correct = 0 for x in range(len(y_test)): if result[x] == y_test[x]: correct += 1 # print(correct) accuracy = (correct / float(len(y_test))) * 100.0 print("Accuracy:", accuracy, "%") return accuracy</code></pre> <hr/> <h2><a id="_67" rel="nofollow"></a>总结</h2> <p>KNN算是机器学习中最简单的算法,实现起来相对简单,但对于我这样的新手,还是花费了大半天时间才整出来。</p> <p>在github上传了项目:<a href="http://www.m6000.cn/wp-content/themes/begin%20lts/inc/go.php?url=https://github.com/chenyi369/KNN" rel="nofollow">https://github.com/chenyi369/KNN</a></p> </div> <div class="clearfix"></div>
<div class="col-md-12 mt-5">
<p>上一个:<a href="/news/article-41069.htm">spring常用配置profile(设置不同的运行环境)使用纪要</a></p>
<p>下一个:<a href="/news/article-41572.htm">EFCore 的 DbFirst 模式</a></p>
</div>
</div>
<div class="col-md-3">
<div class="panel panel-default">
<div class="panel-heading">
<h3 class="panel-title">热门文章</h3>
</div>
<div class="panel-body">
<ul class="p-0 x-0" style="list-style: none;margin: 0;padding: 0;">
<li class="py-2"><a href="/news/article-47998.htm" title="开一个宠物食品加工厂需要什么(开宠物食品厂需要什么手续)">开一个宠物食品加工厂需要什么(开宠物食品厂需要什么手续)</a></li>
<li class="py-2"><a href="/free-nodes/2025-2-11-shadowrocket-node.htm" title="「2月11日」最高速度20.6M/S,2025年V2ray/Clash/Shadowrocket/SSR每天更新免费节点订阅链接">「2月11日」最高速度20.6M/S,2025年V2ray/Clash/Shadowrocket/SSR每天更新免费节点订阅链接</a></li>
<li class="py-2"><a href="/news/article-45053.htm" title="人用疫苗和兽用疫苗哪个好 人用疫苗和兽用疫苗哪个好一点">人用疫苗和兽用疫苗哪个好 人用疫苗和兽用疫苗哪个好一点</a></li>
<li class="py-2"><a href="/free-nodes/2025-1-21-free-clash.htm" title="「1月21日」最高速度22.9M/S,2025年Clash/Shadowrocket/V2ray/SSR每天更新免费节点订阅链接">「1月21日」最高速度22.9M/S,2025年Clash/Shadowrocket/V2ray/SSR每天更新免费节点订阅链接</a></li>
<li class="py-2"><a href="/free-nodes/2025-2-26-free-ssr-node.htm" title="「2月26日」最高速度19.4M/S,2025年Shadowrocket/Clash/V2ray/SSR每天更新免费节点订阅链接">「2月26日」最高速度19.4M/S,2025年Shadowrocket/Clash/V2ray/SSR每天更新免费节点订阅链接</a></li>
<li class="py-2"><a href="/free-nodes/2025-2-23-free-node-subscribe.htm" title="「2月23日」最高速度20.6M/S,2025年Shadowrocket/SSR/V2ray/Clash每天更新免费节点订阅链接">「2月23日」最高速度20.6M/S,2025年Shadowrocket/SSR/V2ray/Clash每天更新免费节点订阅链接</a></li>
<li class="py-2"><a href="/free-nodes/2025-3-4-clash-windows.htm" title="「3月4日」最高速度21.5M/S,2025年V2ray/Shadowrocket/Clash/SSR每天更新免费节点订阅链接">「3月4日」最高速度21.5M/S,2025年V2ray/Shadowrocket/Clash/SSR每天更新免费节点订阅链接</a></li>
<li class="py-2"><a href="/free-nodes/2025-2-6-free-node-subscribe-links.htm" title="「2月6日」最高速度18.9M/S,2025年Clash/SSR/Shadowrocket/V2ray每天更新免费节点订阅链接">「2月6日」最高速度18.9M/S,2025年Clash/SSR/Shadowrocket/V2ray每天更新免费节点订阅链接</a></li>
<li class="py-2"><a href="/news/article-40087.htm" title="宠物领养协议书怎么生效图片 宠物领养协议书怎么生效图片大全">宠物领养协议书怎么生效图片 宠物领养协议书怎么生效图片大全</a></li>
<li class="py-2"><a href="/free-nodes/2025-2-21-free-ssr-subscribe.htm" title="「2月21日」最高速度19.5M/S,2025年Shadowrocket/V2ray/SSR/Clash每天更新免费节点订阅链接">「2月21日」最高速度19.5M/S,2025年Shadowrocket/V2ray/SSR/Clash每天更新免费节点订阅链接</a></li>
</ul>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">
<h3 class="panel-title">归纳</h3>
</div>
<div class="panel-body">
<ul class="p-0 x-0" style="list-style: none;margin: 0;padding: 0;">
<li class="py-2">
<h4><span class="badge" style="float: right;">12</span> <a href="/date/2025-03/" title="2025-03 归档">2025-03</a></h4>
</li>
<li class="py-2">
<h4><span class="badge" style="float: right;">84</span> <a href="/date/2025-02/" title="2025-02 归档">2025-02</a></h4>
</li>
<li class="py-2">
<h4><span class="badge" style="float: right;">83</span> <a href="/date/2025-01/" title="2025-01 归档">2025-01</a></h4>
</li>
</ul>
</div>
</div>
</div>
</div>
</div> <!-- .container -->
</div> <!-- .page-section -->
</main>
<footer class="page-footer">
<div class="container">
<div class="row">
<div class="col-sm-6 py-2">
<p id="copyright">
<p>
<a href="/">首页</a> |
<a href="/free-node/">免费节点</a> |
<a href="/news/">新闻资讯</a> |
<a href="/about-us.htm">关于我们</a> |
<a href="/disclaimer.htm">免责申明</a> |
<a href="/privacy.htm">隐私申明</a> |
<a href="/sitemap.xml">网站地图</a>
</p>
Windows V2ray分享订阅站 版权所有 Powered by WordPress
</p>
</div>
<div class="col-sm-6 py-2 text-right">
<div class="d-inline-block px-3">
<a href="#">Privacy</a>
</div>
<div class="d-inline-block px-3">
<a href="#">Contact Us</a>
</div>
</div>
</div>
</div> <!-- .container -->
</footer> <!-- .page-footer -->
<script src="/assets/website/js/frontend/windowsv2ray/jquery-3.5.1.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/bootstrap.bundle.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/wow/wow.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/owl-carousel/js/owl.carousel.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/waypoints/jquery.waypoints.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/animateNumber/jquery.animateNumber.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/theme.js"></script>
<script src="https://www.freeclashnode.com/assets/js/frontend/invite-url.js"></script>
<script src="/assets/website/js/frontend/G.js"></script>
</body>
</html>