這篇文章主要講解了“Spark MaprLab-Auction Data實(shí)例分析”,文中的講解內(nèi)容簡(jiǎn)單清晰,易于學(xué)習(xí)與理解,下面請(qǐng)大家跟著小編的思路慢慢深入,一起來(lái)研究和學(xué)習(xí)“Spark MaprLab-Auction Data實(shí)例分析”吧!

創(chuàng)新互聯(lián)公司長(zhǎng)期為近1000家客戶(hù)提供的網(wǎng)站建設(shè)服務(wù),團(tuán)隊(duì)從業(yè)經(jīng)驗(yàn)10年,關(guān)注不同地域、不同群體,并針對(duì)不同對(duì)象提供差異化的產(chǎn)品和服務(wù);打造開(kāi)放共贏(yíng)平臺(tái),與合作伙伴共同營(yíng)造健康的互聯(lián)網(wǎng)生態(tài)環(huán)境。為海陽(yáng)企業(yè)提供專(zhuān)業(yè)的做網(wǎng)站、網(wǎng)站制作,海陽(yáng)網(wǎng)站改版等技術(shù)服務(wù)。擁有10多年豐富建站經(jīng)驗(yàn)和眾多成功案例,為您定制開(kāi)發(fā)。
一、環(huán)境安裝
1.安裝hadoop
2.安裝spark
3.啟動(dòng)hadoop
4.啟動(dòng)spark
二、
1.數(shù)據(jù)準(zhǔn)備
從MAPR官網(wǎng)上下載數(shù)據(jù)DEV360DATA.zip并上傳到server上。
[hadoop@hftclclw0001 spark-1.5.1-bin-hadoop2.6]$ pwd /home/hadoop/spark-1.5.1-bin-hadoop2.6 [hadoop@hftclclw0001 spark-1.5.1-bin-hadoop2.6]$ cd test-data/ [hadoop@hftclclw0001 test-data]$ pwd /home/hadoop/spark-1.5.1-bin-hadoop2.6/test-data/DEV360Data [hadoop@hftclclw0001 DEV360Data]$ ll total 337940 -rwxr-xr-x 1 hadoop root 575014 Jun 24 16:18 auctiondata.csv =>c測(cè)試用到的數(shù)據(jù) -rw-r--r-- 1 hadoop root 57772855 Aug 18 20:11 sfpd.csv -rwxrwxrwx 1 hadoop root 287692676 Jul 26 20:39 sfpd.json [hadoop@hftclclw0001 DEV360Data]$ more auctiondata.csv 8213034705,95,2.927373,jake7870,0,95,117.5,xbox,3 8213034705,115,2.943484,davidbresler2,1,95,117.5,xbox,3 8213034705,100,2.951285,gladimacowgirl,58,95,117.5,xbox,3 8213034705,117.5,2.998947,daysrus,10,95,117.5,xbox,3 8213060420,2,0.065266,donnie4814,5,1,120,xbox,3 8213060420,15.25,0.123218,myreeceyboy,52,1,120,xbox,3 ... ... #數(shù)據(jù)結(jié)構(gòu)如下 auctionid,bid,bidtime,bidder,bidrate,openbid,price,itemtype,daystolve #把數(shù)據(jù)上傳到HDFS中 [hadoop@hftclclw0001 DEV360Data]$ hdfs dfs -mkdir -p /spark/exer/mapr [hadoop@hftclclw0001 DEV360Data]$ hdfs dfs -put auctiondata.csv /spark/exer/mapr [hadoop@hftclclw0001 DEV360Data]$ hdfs dfs -ls /spark/exer/mapr Found 1 items -rw-r--r-- 2 hadoop supergroup 575014 2015-10-29 06:17 /spark/exer/mapr/auctiondata.csv
2.運(yùn)行spark-shell 我用的scala.并針對(duì)以下task,進(jìn)行分析
tasks:
a.How many items were sold?
b.How many bids per item type?
c.How many different kinds of item type?
d.What was the minimum number of bids?
e.What was the maximum number of bids?
f.What was the average number of bids?
[hadoop@hftclclw0001 spark-1.5.1-bin-hadoop2.6]$ pwd
/home/hadoop/spark-1.5.1-bin-hadoop2.6
[hadoop@hftclclw0001 spark-1.5.1-bin-hadoop2.6]$ ./bin/spark-shell
...
...
scala >
#首先從HDFS加載數(shù)據(jù)生成RDD
scala > val originalRDD = sc.textFile("/spark/exer/mapr/auctiondata.csv")
...
...
scala > originalRDD ==>我們來(lái)分析下originalRDD的類(lèi)型 RDD[String] 可以看做是一條條String的數(shù)組,Array[String]
res26: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:21
##根據(jù)“,”把每一行分隔使用map
scala > val auctionRDD = originalRDD.map(_.split(","))
scala> auctionRDD ==>我們來(lái)分析下auctionRDD的類(lèi)型 RDD[Array[String]] 可以看做是String的數(shù)組,但元素依然是數(shù)組即,可以認(rèn)為Array[Array[string]]
res17: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[5] at map at <console>:23a.How many items were sold?
==> val count = auctionRDD.map(bid => bid(0)).distinct().count()
根據(jù)auctionid去重即可:每條記錄根據(jù)“,”分隔,再去重,再計(jì)數(shù)
#獲取第一列,即獲取auctionid,依然用map #可以這么理解下面一行,由于auctionRDD是Array[Array[String]]那么進(jìn)行map的每個(gè)參數(shù)類(lèi)型是Array[String],由于actionid是數(shù)組的第一位,即獲取第一個(gè)元素Array(0),注意是()不是[] scala> val auctionidRDD = auctionRDD.map(_(0)) ... ... scala> auctionidRDD ==>我們來(lái)分析下auctionidRDD的類(lèi)型 RDD[String] ,理解為Array[String],即所有的auctionid的數(shù)組 res27: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[17] at map at <console>:26 #對(duì)auctionidRDD去重 scala > val auctionidDistinctRDD=auctionidRDD.distinct() #計(jì)數(shù) scala > auctionidDistinctRDD.count() ... ...
b.How many bids per item type?
===> auctionRDD.map(bid => (bid(7),1)).reduceByKey((x,y) => x + y).collect()
#map每一行,獲取出第7列,即itemtype那一列,輸出(itemtype,1) #可以看做輸出的類(lèi)型是(String,Int)的數(shù)組 scala > auctionRDD.map(bid=>(bid(7),1)) res30: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[26] at map at <console>:26 ... #reduceByKey即按照key進(jìn)行reduce #解析下reduceByKey對(duì)于相同的key, #(xbox,1)(xbox,1)(xbox,1)(xbox,1)...(xbox,1) ==> reduceByKey ==> (xbox,(..(((1 + 1) + 1) + ... + 1)) scala > auctionRDD.map(bid=>(bid(7),1)).reduceByKey((x,y) => x + y) #類(lèi)型依然是(String,Int)的數(shù)組 String=>itemtype Int已經(jīng)是該itemtype的計(jì)數(shù)總和了 res31: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[28] at reduceByKey at <console>:26 #通過(guò)collect() 轉(zhuǎn)換成 Array類(lèi)型數(shù)組 scala > auctionRDD.map(bid=>(bid(7),1)).reduceByKey((x,y) => x + y).collect() res32: Array[(String, Int)] = Array((palm,5917), (cartier,1953), (xbox,2784))
感謝各位的閱讀,以上就是“Spark MaprLab-Auction Data實(shí)例分析”的內(nèi)容了,經(jīng)過(guò)本文的學(xué)習(xí)后,相信大家對(duì)Spark MaprLab-Auction Data實(shí)例分析這一問(wèn)題有了更深刻的體會(huì),具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是創(chuàng)新互聯(lián),小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章,歡迎關(guān)注!
網(wǎng)頁(yè)標(biāo)題:SparkMaprLab-AuctionData實(shí)例分析
文章源于:http://www.chinadenli.net/article16/pgdhgg.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供動(dòng)態(tài)網(wǎng)站、網(wǎng)站設(shè)計(jì)公司、網(wǎng)站維護(hù)、網(wǎng)站內(nèi)鏈、網(wǎng)站制作、做網(wǎng)站
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶(hù)投稿、用戶(hù)轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀(guān)點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話(huà):028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)
猜你還喜歡下面的內(nèi)容