-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy patharticle-30442.htm
208 lines (196 loc) · 23.3 KB
/
article-30442.htm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
<!DOCTYPE html>
<html xml:lang="zh-CN" lang="zh-CN">
<head>
<link rel="canonical" href="https://windowsv2ray.github.io/news/article-30442.htm" />
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>pyspark读取和存入数据的三种方法</title>
<meta name="description" content="方法一:从hdfs读取 # -*- coding: utf-8 -* from pyspark.sql import SparkSession, HiveContext,DataFrameWriter" />
<link rel="icon" href="/assets/website/img/windowsv2ray/favicon.ico" type="image/x-icon"/>
<meta name="author" content="Windows V2ray分享订阅站">
<meta property="og:type" content="article" />
<meta property="og:url" content="https://windowsv2ray.github.io/news/article-30442.htm" />
<meta property="og:site_name" content="Windows V2ray分享订阅站" />
<meta property="og:title" content="pyspark读取和存入数据的三种方法" />
<meta property="og:image" content="https://windowsv2ray.github.io/uploads/20240604/1ba9f867eed9a761cd113ecfed10e23a.webp" />
<meta property="og:release_date" content="2024-12-24T09:42:09" />
<meta property="og:updated_time" content="2024-12-24T09:42:09" />
<meta property="og:description" content="方法一:从hdfs读取 # -*- coding: utf-8 -* from pyspark.sql import SparkSession, HiveContext,DataFrameWriter" />
<meta name="applicable-device" content="pc,mobile" />
<meta name="renderer" content="webkit" />
<meta name="force-rendering" content="webkit" />
<meta http-equiv="Cache-Control" content="no-transform" />
<meta name="robots" content="max-image-preview:large" />
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="apple-mobile-web-app-title" content="pyspark读取和存入数据的三种方法">
<meta name="format-detection" content="telephone=no">
<link rel="dns-prefetch" href="https:/www.googletagmanager.com">
<link rel="dns-prefetch" href="https://www.googleadservices.com">
<link rel="dns-prefetch" href="https://www.google-analytics.com">
<link rel="dns-prefetch" href="https://pagead2.googlesyndication.com">
<link rel="dns-prefetch" href="https://cm.g.doubleclick.net">
<link rel="stylesheet" href="/assets/website/js/frontend/windowsv2ray/animate/animate.css">
<link rel="stylesheet" href="/assets/website/css/windowsv2ray/bootstrap.css">
<link rel="stylesheet" href="/assets/website/css/windowsv2ray/maicons.css">
<link rel="stylesheet" href="/assets/website/js/frontend/windowsv2ray/owl-carousel/css/owl.carousel.css">
<link rel="stylesheet" href="/assets/website/css/windowsv2ray/theme.css">
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-JN82W0GJX5"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-JN82W0GJX5');
</script>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3332997411212854"
crossorigin="anonymous"></script>
</head>
<body data-page="detail">
<!-- Back to top button -->
<div class="back-to-top"></div>
<header>
<nav class="navbar navbar-expand-lg navbar-light navbar-float">
<div class="container">
<a href="/" class="navbar-brand">
<span>Windows V2ray</span>
</a>
<button class="navbar-toggler" data-toggle="collapse" data-target="#navbarContent" aria-controls="navbarContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="navbar-collapse collapse" id="navbarContent">
<ul class="navbar-nav ml-lg-4 pt-3 pt-lg-0">
<li class="nav-item">
<a href="/" class="nav-link">首页</a>
</li>
<li class="nav-item">
<a href="/free-nodes/" class="nav-link">免费节点</a>
</li>
<li class="nav-item">
<a href="/paid-subscribe/" class="nav-link">推荐机场</a>
</li>
<li class="nav-item">
<a href="/client.htm" class="nav-link">客户端</a>
</li>
<li class="nav-item">
<a href="/news/" class="nav-link">新闻资讯</a>
</li>
</ul>
</div>
</div>
</nav>
<div class="container mt-5">
<div class="page-banner">
<div class="row justify-content-center align-items-center h-100">
<div class="col-md-10">
<h1 class="text-center">pyspark读取和存入数据的三种方法</h1>
<nav aria-label="Breadcrumb">
<ul class="breadcrumb justify-content-center py-0 bg-transparent">
<li class="breadcrumb-item"><a href="/">首页</a></li>
<li class="breadcrumb-item"><a href="/news/">新闻资讯</a></li>
<li class="breadcrumb-item active">正文</li>
</ul>
</nav>
</div>
</div>
</div>
</div>
</header>
<main>
<div class="page-section">
<div class="container">
<div class="row">
<div class="col-md-9">
<input type="hidden" id="share-website-info" data-name="" data-url="">
<div id="content_views" class="markdown_views prism-atom-one-light"> </h1> <h2> <a id="hdfs_1" rel="nofollow"></a>方法一:从hdfs读取</h2> <pre><code class="prism language-scala"># <span class="token operator">-</span><span class="token operator">*</span><span class="token operator">-</span> coding<span class="token operator">:</span> utf<span class="token operator">-</span><span class="token number">8</span> <span class="token operator">-</span><span class="token operator">*</span> from pyspark<span class="token punctuation">.</span>sql <span class="token keyword">import</span> SparkSession<span class="token punctuation">,</span> HiveContext<span class="token punctuation">,</span>DataFrameWriter <span class="token keyword">import</span> argparse <span class="token keyword">import</span> time <span class="token keyword">import</span> numpy as np <span class="token keyword">import</span> pandas as pd spark <span class="token operator">=</span> SparkSession<span class="token punctuation">.</span>builder<span class="token punctuation">.</span>enableHiveSupport<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span>appName<span class="token punctuation">(</span><span class="token string">"test"</span><span class="token punctuation">)</span><span class="token punctuation">.</span>getOrCreate<span class="token punctuation">(</span><span class="token punctuation">)</span> start <span class="token operator">=</span> time<span class="token punctuation">.</span>time<span class="token punctuation">(</span><span class="token punctuation">)</span> ### 数据载入方法<span class="token number">1</span>: hdfs上载入parquent格式 input <span class="token operator">=</span> <span class="token string">"/aaa/bbb/ccc"</span> data <span class="token operator">=</span> spark<span class="token punctuation">.</span>read<span class="token punctuation">.</span>parquet<span class="token punctuation">(</span>input<span class="token punctuation">)</span> data<span class="token punctuation">.</span>show<span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">)</span> <span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">+</span> <span class="token operator">|</span> START_TIME<span class="token operator">|</span>amount<span class="token operator">|</span> payerCode<span class="token operator">|</span> <span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">+</span> <span class="token operator">|</span><span class="token number">2019</span><span class="token operator">-</span><span class="token number">06</span><span class="token operator">-</span><span class="token number">28</span> <span class="token number">21</span><span class="token operator">:</span><span class="token number">04</span><span class="token operator">:</span><span class="token number">37</span><span class="token operator">|</span> <span class="token number">10.7</span><span class="token operator">|</span><span class="token number">692200000</span>XXXXXXX<span class="token operator">|</span> <span class="token operator">|</span><span class="token number">2018</span><span class="token operator">-</span><span class="token number">11</span><span class="token operator">-</span><span class="token number">24</span> <span class="token number">20</span><span class="token operator">:</span><span class="token number">15</span><span class="token operator">:</span><span class="token number">40</span><span class="token operator">|</span> <span class="token number">19.9</span><span class="token operator">|</span><span class="token number">602200000</span>XXXXXXX<span class="token operator">|</span> <span class="token operator">|</span><span class="token number">2019</span><span class="token operator">-</span><span class="token number">06</span><span class="token operator">-</span><span class="token number">19</span> <span class="token number">12</span><span class="token operator">:</span><span class="token number">33</span><span class="token operator">:</span><span class="token number">14</span><span class="token operator">|</span> <span class="token number">2.0</span><span class="token operator">|</span><span class="token number">692200000</span>XXXXXXX<span class="token operator">|</span> <span class="token operator">|</span><span class="token number">2019</span><span class="token operator">-</span><span class="token number">07</span><span class="token operator">-</span><span class="token number">03</span> <span class="token number">23</span><span class="token operator">:</span><span class="token number">04</span><span class="token operator">:</span><span class="token number">12</span><span class="token operator">|</span> <span class="token number">5.27</span><span class="token operator">|</span><span class="token number">622200000</span>XXXXXXX<span class="token operator">|</span> <span class="token operator">|</span><span class="token number">2018</span><span class="token operator">-</span><span class="token number">11</span><span class="token operator">-</span><span class="token number">26</span> <span class="token number">21</span><span class="token operator">:</span><span class="token number">26</span><span class="token operator">:</span><span class="token number">30</span><span class="token operator">|</span> <span class="token number">2.0</span><span class="token operator">|</span><span class="token number">622200000</span>XXXXXXX<span class="token operator">|</span> <span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">+</span> ## pyspark读取数据方法二:从hive中读取 </code></pre> <h2> <a id="_41" rel="nofollow"></a>方法二:数据从数据库读取</h2> <pre><code>####### 生成查询的SQL语句,这个跟hive的查询语句一样,所以也可以加where等条件语句 hive_context= HiveContext(spark) hive_read = "select * from {}.{}".format(hive_database, hive_table2) ####### 通过SQL语句在hive中查询的数据直接是dataframe的形式 read_df = hive_context.sql(hive_read) read_df.show(5) +-------------------+------+--------------------+ | START_TIME|amount| payerCode| +-------------------+------+--------------------+ |2019-06-28 21:04:37| 10.7|692200000XXXXXXX| |2018-11-24 20:15:40| 19.9|602200000XXXXXXX| |2019-06-19 12:33:14| 2.0|692200000XXXXXXX| |2019-07-03 23:04:12| 5.27|622200000XXXXXXX| |2018-11-26 21:26:30| 2.0|622200000XXXXXXX| +-------------------+------+--------------------+ </code></pre> <h2> <a id="3hdfscsv_66" rel="nofollow"></a>方法3:读取hdfs上的csv文件</h2> <pre><code> tttt = spark.read.csv(filepath,header=’true’,inferSchema=’true’,sep=’,’) </code></pre> </h1> <h2> <a id="1_parquenthdfs_73" rel="nofollow"></a>方法1: 以parquent格式存储到hdfs</h2> <pre><code>data1.write.mode(SaveMode.Overwrite).parquet(output) </code></pre> <h2> <a id="2Tablehive_79" rel="nofollow"></a>方法2:以Table的格式存入hive数据库</h2> <pre><code>##### 数据存入数据库 hive_database = "testt0618" data1 = data.limit(10) </code></pre> <h4> <a id="1_saveAsTablehive_88" rel="nofollow"></a>1: 用saveAsTable()方法存入hive数据库</h4> <pre><code> hive_table1 = "ii" data1.write.format("hive").mode("overwrite").saveAsTable('{}.{}'.format(hive_database, hive_table1)) </code></pre> <h4> <a id="2sqlhive_96" rel="nofollow"></a>2:利用sql语句存入hive数据库</h4> <pre><code>hive_table2 = "lll" data1.registerTempTable('test_hive') sqlContext.sql("create table {}.{} select * from test_hive".format(hive_database, hive_table2)) </code></pre> <h2> <a id="3csvhdfs_104" rel="nofollow"></a>方法3:以csv格式存储到hdfs</h2> <pre><code>output = “/aaa/bbb/ccc” data1.coalesce(1).write.option("sep", "#").option("header", "true").csv(output + "_text",mode='overwrite') </code></pre> <p>参考相关:</p> <ol> <li>www.zzvips.com/article/73466.html</li> <li>https://zhuanlan.zhihu.com/p/34901558</li> </ol> </div> <div class="clearfix"></div>
<div class="col-md-12 mt-5">
<p>上一个:<a href="/news/article-29762.htm">宠物粮食哪里批发便宜些呢(宠物粮批发市场在哪)</a></p>
<p>下一个:<a href="/news/article-30443.htm">java之Arrays工具类的使用</a></p>
</div>
</div>
<div class="col-md-3">
<div class="panel panel-default">
<div class="panel-heading">
<h3 class="panel-title">热门文章</h3>
</div>
<div class="panel-body">
<ul class="p-0 x-0" style="list-style: none;margin: 0;padding: 0;">
<li class="py-2"><a href="/news/article-57050.htm" title="狗粮批发价在哪里找(狗粮哪里有卖)">狗粮批发价在哪里找(狗粮哪里有卖)</a></li>
<li class="py-2"><a href="/free-nodes/2025-1-4-shadowrocket-node.htm" title="「1月4日」最高速度21.5M/S,2025年V2ray/SSR/Clash/Shadowrocket每天更新免费节点订阅链接">「1月4日」最高速度21.5M/S,2025年V2ray/SSR/Clash/Shadowrocket每天更新免费节点订阅链接</a></li>
<li class="py-2"><a href="/free-nodes/2025-2-10-free-ssr-node.htm" title="「2月10日」最高速度19.6M/S,2025年Clash/SSR/Shadowrocket/V2ray每天更新免费节点订阅链接">「2月10日」最高速度19.6M/S,2025年Clash/SSR/Shadowrocket/V2ray每天更新免费节点订阅链接</a></li>
<li class="py-2"><a href="/news/article-20695.htm" title="动物疫苗属于疫苗分类吗为什么没有营养(动物疫苗包括哪些)">动物疫苗属于疫苗分类吗为什么没有营养(动物疫苗包括哪些)</a></li>
<li class="py-2"><a href="/news/article-61069.htm" title="动物防疫的名词解释(动物防疫的概念)">动物防疫的名词解释(动物防疫的概念)</a></li>
<li class="py-2"><a href="/news/article-25304.htm" title="宠物家庭寄养协议书范本下载(宠物寄养协议有法律效力吗)">宠物家庭寄养协议书范本下载(宠物寄养协议有法律效力吗)</a></li>
<li class="py-2"><a href="/news/article-65873.htm" title="动物医院取什么名字比较好一点(动物医院取什么名字比较好一点的)">动物医院取什么名字比较好一点(动物医院取什么名字比较好一点的)</a></li>
<li class="py-2"><a href="/news/article-25770.htm" title="猫猫做驱虫多少钱(宠物猫做驱虫多少钱)">猫猫做驱虫多少钱(宠物猫做驱虫多少钱)</a></li>
<li class="py-2"><a href="/news/article-62243.htm" title="驱虫给猫多少钱(驱虫猫多少钱一次)">驱虫给猫多少钱(驱虫猫多少钱一次)</a></li>
<li class="py-2"><a href="/news/article-27163.htm" title="猫粮狗粮零售如何挣钱(狗粮猫粮的销售和利润怎么样)">猫粮狗粮零售如何挣钱(狗粮猫粮的销售和利润怎么样)</a></li>
</ul>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">
<h3 class="panel-title">归纳</h3>
</div>
<div class="panel-body">
<ul class="p-0 x-0" style="list-style: none;margin: 0;padding: 0;">
<li class="py-2">
<h4><span class="badge" style="float: right;">6</span> <a href="/date/2025-03/" title="2025-03 归档">2025-03</a></h4>
</li>
<li class="py-2">
<h4><span class="badge" style="float: right;">84</span> <a href="/date/2025-02/" title="2025-02 归档">2025-02</a></h4>
</li>
<li class="py-2">
<h4><span class="badge" style="float: right;">92</span> <a href="/date/2025-01/" title="2025-01 归档">2025-01</a></h4>
</li>
<li class="py-2">
<h4><span class="badge" style="float: right;">87</span> <a href="/date/2024-12/" title="2024-12 归档">2024-12</a></h4>
</li>
</ul>
</div>
</div>
</div>
</div>
</div> <!-- .container -->
</div> <!-- .page-section -->
</main>
<footer class="page-footer">
<div class="container">
<div class="row">
<div class="col-sm-6 py-2">
<p id="copyright">
<p>
<a href="/">首页</a> |
<a href="/free-node/">免费节点</a> |
<a href="/news/">新闻资讯</a> |
<a href="/about-us.htm">关于我们</a> |
<a href="/disclaimer.htm">免责申明</a> |
<a href="/privacy.htm">隐私申明</a> |
<a href="/sitemap.xml">网站地图</a>
</p>
Windows V2ray分享订阅站 版权所有 Powered by WordPress
</p>
</div>
<div class="col-sm-6 py-2 text-right">
<div class="d-inline-block px-3">
<a href="#">Privacy</a>
</div>
<div class="d-inline-block px-3">
<a href="#">Contact Us</a>
</div>
</div>
</div>
</div> <!-- .container -->
</footer> <!-- .page-footer -->
<script src="/assets/website/js/frontend/windowsv2ray/jquery-3.5.1.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/bootstrap.bundle.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/wow/wow.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/owl-carousel/js/owl.carousel.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/waypoints/jquery.waypoints.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/animateNumber/jquery.animateNumber.min.js"></script>
<script src="/assets/website/js/frontend/windowsv2ray/theme.js"></script>
<script src="https://www.freeclashnode.com/assets/js/frontend/invite-url.js"></script>
<script src="/assets/website/js/frontend/G.js"></script>
</body>
</html>