hive的基本語(yǔ)法匯總（hql）

2019/2/20 星期三

hive的基本語(yǔ)法匯總（hql）
——————————————————————————————————————————————
Hive學(xué)習(xí)3：Hive三種建表語(yǔ)句詳解https://blog.csdn.net/qq_36743482/article/details/78383964
Hive建表方式共有三種：
1、直接建表法
例如：create table table_name(col_name data_type);
2、查詢建表法
例如：通過(guò)AS 查詢語(yǔ)句完成建表：將子查詢的結(jié)果存在新表里，有數(shù)據(jù)，一般用于中間表
3、like建表法
會(huì)創(chuàng)建結(jié)構(gòu)完全相同的表，但是沒(méi)有數(shù)據(jù)。常用語(yǔ)中間表
//詳細(xì)解釋見(jiàn)鏈接

成都網(wǎng)絡(luò)公司-成都網(wǎng)站建設(shè)公司創(chuàng)新互聯(lián)10余年經(jīng)驗(yàn)成就非凡，專業(yè)從事做網(wǎng)站、成都網(wǎng)站設(shè)計(jì)，成都網(wǎng)頁(yè)設(shè)計(jì)，成都網(wǎng)頁(yè)制作，軟文營(yíng)銷，一元廣告等。10余年來(lái)已成功提供全面的成都網(wǎng)站建設(shè)方案，打造行業(yè)特色的成都網(wǎng)站建設(shè)案例，建站熱線：13518219792，我們期待您的來(lái)電！

Hive文件格式（表STORE AS 的四種類型）：https://blog.csdn.net/hereiskxm/article/details/42171325
hive文件存儲(chǔ)格式包括以下幾類：
1、TEXTFILE
2、SEQUENCEFILE //序列文件
3、RCFILE
4、ORCFILE(0.11以后出現(xiàn))
小結(jié)
其中TEXTFILE為默認(rèn)格式，建表時(shí)不指定默認(rèn)為這個(gè)格式，導(dǎo)入數(shù)據(jù)時(shí)會(huì)直接把數(shù)據(jù)文件拷貝到hdfs上不進(jìn)行處理；
SEQUENCEFILE，RCFILE，ORCFILE格式的表不能直接從本地文件導(dǎo)入數(shù)據(jù)，數(shù)據(jù)要先導(dǎo)入到textfile格式的表中，然后再?gòu)谋碇杏胕nsert導(dǎo)入SequenceFile,RCFile,ORCFile表中。
//詳細(xì)解釋見(jiàn)鏈接
小結(jié)
1、textfile默認(rèn)格式，數(shù)據(jù)不做壓縮，磁盤開(kāi)銷大，數(shù)據(jù)解析開(kāi)銷大。
可結(jié)合Gzip、Bzip2使用(系統(tǒng)自動(dòng)檢查，執(zhí)行查詢時(shí)自動(dòng)解壓)，但使用這種方式，hive不會(huì)對(duì)數(shù)據(jù)進(jìn)行切分，
從而無(wú)法對(duì)數(shù)據(jù)進(jìn)行并行操作。
2、SequenceFile是Hadoop API提供的一種二進(jìn)制文件支持，其具有使用方便、可分割、可壓縮的特點(diǎn)。
SequenceFile支持三種壓縮選擇：NONE，RECORD，BLOCK。Record壓縮率低，一般建議使用BLOCK壓縮。
3、RCFILE是一種行列存儲(chǔ)相結(jié)合的存儲(chǔ)方式。首先，其將數(shù)據(jù)按行分塊，保證同一個(gè)record在一個(gè)塊上，避免讀一個(gè)記錄需要讀取多個(gè)block。其次，塊數(shù)據(jù)列式存儲(chǔ)，有利于數(shù)據(jù)壓縮和快速的列存取。
4、ORCFILE()
總結(jié):
相比TEXTFILE和SEQUENCEFILE，RCFILE由于列式存儲(chǔ)方式，數(shù)據(jù)加載時(shí)性能消耗較大，但是具有較好的壓縮比和查詢響應(yīng)。數(shù)據(jù)倉(cāng)庫(kù)的特點(diǎn)是一次寫(xiě)入、多次讀取，因此，整體來(lái)看，RCFILE相比其余兩種格式具有較明顯的優(yōu)勢(shì)。

set hive.cli.print.header=true; //hive客戶端打印頭部

CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING) //分區(qū)
ROW FORMAT DELIMITED //這句話的意思是，文件里的一行就是我們這個(gè)表中的一條記錄
FIELDS TERMINATED BY '\001' //我們用'tab'作為分隔符
STORED AS SEQUENCEFILE; //存儲(chǔ)為序列文件

//sequencefile
create table tab_ip_seq(id int,name string,ip string,country string)
row format delimited
fields terminated by ','
stored as sequencefile;

//external 外部表
CREATE EXTERNAL TABLE tab_ip_ext(id int, name string,ip STRING,country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/external/user';

//從本地導(dǎo)入數(shù)據(jù)到hive 的表中（實(shí)質(zhì)就是將文件上傳到hdfs 中hive 管理目錄下）
load data local inpath '/home/hadoop/ip.txt' into table tab_ext;

//從hdfs 上導(dǎo)入數(shù)據(jù)到hive 表中（實(shí)質(zhì)就是將文件從原始目錄移動(dòng)到hive 管理的目錄下）
load data inpath 'hdfs://ns1/aa/bb/data.log' into table tab_user;

//使用select 語(yǔ)句來(lái)批量插入數(shù)據(jù)
insert overwrite table tab_ip_seq select * from tab_ext;

//create & load
create table tab_ip(id int,name string,ip string,country string)
row format delimited
fields terminated by ','
stored as textfile;

// CTAS 根據(jù)select 語(yǔ)句建表結(jié)構(gòu)
CREATE TABLE tab_ip_ctas
AS
SELECT id new_id, name new_name, ip new_ip,country new_country FROM tab_ip_ext
SORT BY new_id; //排序方式按照new_id

//CLUSTER <--相對(duì)高級(jí)一點(diǎn)，你可以放在有精力的時(shí)候才去學(xué)習(xí)> //此部分為分桶表的意思
create table tab_ip_cluster(id int,name string,ip string,country string)
clustered by(id) into 3 buckets; //由（id）聚集成3個(gè)桶

load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_cluster;
set hive.enforce.bucketing=true; //hive執(zhí)行分桶
insert into table tab_ip_cluster select * from tab_ip;

select * from tab_ip_cluster tablesample(bucket 2 out of 3 on id);

//PARTITION 分區(qū)表
create table tab_ip_part(id int,name string,ip string,country string)
partitioned by (year string)
row format delimited
fields terminated by ',';

load data local inpath '/home/hadoop/data.log' overwrite into table tab_ip_part
partition(year='1990');

load data local inpath '/home/hadoop/data2.log' overwrite into table tab_ip_part
partition(year='2000');

上面的意思是把/home/hadoop/data2.log數(shù)據(jù)插入到tab_ip_part下2000的這個(gè)分區(qū)中

select from tab_ip_part;
select from tab_ip_part where part_flag='part2';
select count(*) from tab_ip_part where part_flag='part2';

更改命令
alter table tab_ip change id id_alter string;
ALTER TABLE tab_cts ADD PARTITION (partCol = 'dt') location '/external/hive/dt';

show partitions tab_ip_part; //查看分區(qū)情況

HIVE-分桶表的詳解和創(chuàng)建實(shí)例 https://www.cnblogs.com/kouryoushine/p/7809299.html
Hive的分區(qū)表和分桶表的區(qū)別 https://blog.csdn.net/jenrey/article/details/80588493
Hive 基礎(chǔ)之：分區(qū)、桶、Sort Merge Bucket Join https://blog.csdn.net/wisgood/article/details/17186107
hive的分區(qū)分桶為高級(jí)功能，可以深入學(xué)習(xí)了解（參考xinghuan資料）

//insert from select 通過(guò)select 語(yǔ)句批量插入數(shù)據(jù)到別的表
create table tab_ip_like like tab_ip;
insert overwrite table tab_ip_like select * from tab_ip;

//write to hdfs 將結(jié)果寫(xiě)入到hdfs 的文件中
insert overwrite local directory '/home/hadoop/hivetemp/test.txt' select from tab_ip_part where part_flag='part1';
insert overwrite directory '/hiveout.txt' select from tab_ip_part where part_flag='part1';

//cli shell 通過(guò)shell 執(zhí)行hive 的hql 語(yǔ)句
hive -S -e 'select country,count() from tab_ext' > /home/hadoop/hivetemp/e.txt
select from tab_ext sort by id desc limit 5;
select a.ip,b.book from tab_ext a join tab_ip_book b on(a.name=b.name);

hive復(fù)雜格式array,map,struct使用： https://blog.csdn.net/u010670689/article/details/72885944
//array 數(shù)組
create table tab_array(a array<int>,b array<string>)
row format delimited
fields terminated by '\t'
collection items terminated by ',';

select a[0] from tab_array;
select * from tab_array where array_contains(b,'word');
insert into table tab_array select array(0),array(name,ip) from tab_ext t;

Hive中數(shù)組的使用 https://blog.csdn.net/zhao897426182/article/details/78347960

//map //映射
create table tab_map(name string,info map<string,string>)
row format delimited
fields terminated by '\t'
collection items terminated by ',' //收集物品終止,
map keys terminated by ':'; //映射鍵終止

load data local inpath '/home/hadoop/hivetemp/tab_map.txt' overwrite into table tab_map;
insert into table tab_map select name,map('name',name,'ip',ip) from tab_ext;

//struct
create table tab_struct(name string,info struct<age:int,tel:string,addr:string>)
row format delimited
fields terminated by '\t'
collection items terminated by ','

load data local inpath '/home/hadoop/hivetemp/tab_st.txt' overwrite into table tab_struct;
insert into table tab_struct select name,named_struct('age',id,'tel',name,'addr',country) from tab_ext;

//UDF 函數(shù) 這一步運(yùn)維人員可以忽略，有興趣可以學(xué)習(xí)
select if(id=1,first,no-first),name from tab_ext;
hive>add jar /home/hadoop/myudf.jar;
hive>CREATE TEMPORARY FUNCTION fanyi AS 'cn.itcast.hive.Fanyi';
select id,name,ip,fanyi(country) from tab_ip_ext;

Hive的UDF是什么 https://blog.csdn.net/yqlakers/article/details/70211522
首先什么是UDF，UDF的全稱為user-defined function，用戶定義函數(shù)，為什么有它的存在呢？有的時(shí)候你要寫(xiě)的查詢無(wú)法輕松地使用Hive提供的內(nèi)置函數(shù)來(lái)表示，通過(guò)寫(xiě)UDF，Hive就可以方便地插入用戶寫(xiě)的處理代碼并在查詢中使用它們，相當(dāng)于在HQL（Hive SQL）中自定義一些函數(shù)。首先UDF必須用java語(yǔ)言編寫(xiě)，Hive本身就是用java寫(xiě)的。所以想學(xué)好hadoop這個(gè)分布式框架的相關(guān)技術(shù)，熟練使用java就是基本功了！

作者：YQlakers
來(lái)源：CSDN
原文：https://blog.csdn.net/yqlakers/article/details/70211522
版權(quán)聲明：本文為博主原創(chuàng)文章，轉(zhuǎn)載請(qǐng)附上博文鏈接！

hql查詢進(jìn)階 //hive的查詢語(yǔ)法
hive中order by,sort by, distribute by, cluster by作用以及用法：https://blog.csdn.net/jthink_/article/details/38903775
MapReduce 腳本
連接（join）
內(nèi)連接（inner join）
外連接（outer join）
半連接（semi join）
Map 連接（map join）
子查詢（sub query）
視圖（view）
通過(guò)Hive 提供的order by 子句可以讓最終的輸出結(jié)果整體有序。但是因?yàn)镠ive是基于Hadoop 之上的，要生成這種整體有序的結(jié)果，就必須強(qiáng)迫Hadoop 只利用一個(gè)Reduce 來(lái)完成處理。這種方式的副作用就是回降低效率。

如果你不需要最終結(jié)果整體有序，你就可以使用sort by 子句來(lái)進(jìn)行排序。這種排序操作只保證每個(gè)Reduce 的輸出是有序的。如果你希望某些特定行被同一個(gè)Reduce 處理，則你可以使用distribute（分發(fā)）子句來(lái)完成。比如：
表student（classNo，stuNo，score）數(shù)據(jù)如下：
C01 N0101 82
C01 N0102 59
C02 N0201 81
C01 N0103 65
C03 N0302 92
C02 N0202 82
C02 N0203 79
C03 N0301 56
C03 N0306 72
我們希望按照成績(jī)由低到高輸出每個(gè)班級(jí)的成績(jī)信息。執(zhí)行以下語(yǔ)句：
Select classNo,stuNo,score from student distribute byclassNo sort by score;
輸出結(jié)果為:
C02 N0203 79
C02 N0201 81
C02 N0202 82
C03 N0301 56
C03 N0306 72
C03 N0302 92
C01 N0102 59
C01 N0103 65
C01 N0101 82
我們可以看到每一個(gè)班級(jí)里所有的學(xué)生成績(jī)是有序的。因?yàn)橥粋€(gè)classNo 的記錄會(huì)被分發(fā)到一個(gè)單獨(dú)的reduce 處理，而同時(shí)sort by 保證了每一個(gè)reduce的輸出是有序的。
注意：
為了測(cè)試上例中的distribute by 的效果，你應(yīng)該首先設(shè)置足夠多的reduce。比如上例中有3 個(gè)不同的classNo，則我們需要設(shè)置reduce 個(gè)數(shù)至少為3 或更多。如果設(shè)置的reduce 個(gè)數(shù)少于3，將會(huì)導(dǎo)致多個(gè)不同的classNo 被分發(fā)到同
一個(gè)reduce，從而不能產(chǎn)生你所期望的輸出。設(shè)置命令如下：
set mapred.reduce.tasks = 3;

MapReduce 腳本
如果我們需要在查詢語(yǔ)句中調(diào)用外部腳本，比如Python，則我們可以使用
transform（轉(zhuǎn)變），map，reduce 等子句。
比如，我們希望過(guò)濾掉所有不及格的學(xué)生記錄，只輸出及格學(xué)生的成績(jī)信息。
新建一個(gè)Python 腳本文件score_pass.py，內(nèi)容如下：
#! /usr/bin/env python
import sys
for line in sys.stdin:
(classNo,stuNo,score)= line.strip().split('\t')
ifint(score) >= 60:
print"%s\t%s\t%s" %(classNo,stuNo,score)

執(zhí)行以下語(yǔ)句
add file /home/user/score_pass.py;
select transform(classNo,stuNo,score) using'score_pass.py' as classNo,stuNo,score from student;
輸出結(jié)果為：
C01 N0101 82
C02 N0201 81
C01 N0103 65
C03 N0302 92
C02 N0202 82
C02 N0203 79
C03 N0306 72
注意：
1) 以上Python 腳本中，分隔符只能是制表符(\t)。同樣輸出的分隔符也必須為制表符。這個(gè)是有hive 自身決定的，不能更改，不要嘗試使用其他分隔符，否則會(huì)報(bào)錯(cuò)。同時(shí)需要調(diào)用strip 函數(shù)，以去除掉行尾的換行符。（或者直接使用不帶參數(shù)的line.split()代替。
2) 使用腳本前，先使用add file 語(yǔ)句注冊(cè)腳本文件，以便hive 將其分發(fā)到Hadoop 集群。
3) Transfom 傳遞數(shù)據(jù)到Python 腳本，as 語(yǔ)句指定輸出的列。

連接（join）
直接編程使用Hadoop 的MapReduce 是一件比較費(fèi)時(shí)的事情。Hive 則大大簡(jiǎn)化了這個(gè)操作。
內(nèi)連接（inner join）
和SQL 的內(nèi)連相似。執(zhí)行以下語(yǔ)句查詢每個(gè)學(xué)生的編號(hào)和教師名：
Select a.stuNo,b.teacherName from student a join teacherb on a.classNo = b.classNo;
輸出結(jié)果如下：
N0203 Sun
N0202 Sun
N0201 Sun
N0306 Wang
N0301 Wang
N0302 Wang
N0103 Zhang
N0102 Zhang
N0101 Zhang

注意：
數(shù)據(jù)文件內(nèi)容請(qǐng)參照上一篇文章。
不要使用select xx from aa,bb where aa.f=bb.f 這樣的語(yǔ)法，hive 不支持這種寫(xiě)法。
如果需要查看hive 的執(zhí)行計(jì)劃，你可以在語(yǔ)句前加上explain，比如：
explain Select a.stuNo,b.teacherName from student a jointeacher b on a.classNo = b.classNo;

外連接（outer join）
和傳統(tǒng)SQL 類似，Hive 提供了left outer join，right outer join，full out join。

半連接（semi join）
Hive 不提供in 子查詢。此時(shí)你可以用leftsemi join 實(shí)現(xiàn)同樣的功能。執(zhí)行以下語(yǔ)句：
Select * from teacher left semi join student onstudent.classNo = teacher.classNo;
輸出結(jié)果如下：
C02 Sun
C03 Wang
C01 Zhang
可以看出，C04 Dong 沒(méi)有出現(xiàn)在查詢結(jié)果中，因?yàn)镃04 在表student 中不存在。
注意：
右表（student）中的字段只能出現(xiàn)在on 子句中，不能出現(xiàn)在其他地方，比如不能出現(xiàn)在select 子句中。

Map 連接（map join）
當(dāng)一個(gè)表非常小，足以直接裝載到內(nèi)存中去時(shí)，可以使用map 連接以提高效率，
比如：
Select /+mapjoin(teacher) / a.stuNo,b.teacherNamefrom student a join teacher b on a.classNo = b.classNo;
以上紅色標(biāo)記部分采用了C 的注釋風(fēng)格。
當(dāng)連接時(shí)用到不等值判斷時(shí)，也比較適合Map 連接。具體原因需要深入了解Hive和MapReduce 的工作原理。

子查詢（sub query）
運(yùn)行以下語(yǔ)句將返回所有班級(jí)平均分的最高記錄。
Select max(avgScore) as maScore from
(Select classNo,avg(score) as avgScore from student group byclassNo) a;
輸出結(jié)果：
80.66666666666667
以上語(yǔ)句中紅色部分為一個(gè)子查詢，且別名為a。返回的子查詢結(jié)果和一個(gè)表類似，可以被繼續(xù)查詢。

視圖（view）
和傳統(tǒng)數(shù)據(jù)庫(kù)中的視圖類似，Hive 的視圖只是一個(gè)定義，視圖數(shù)據(jù)并不會(huì)存儲(chǔ)到文件系統(tǒng)中。同樣，視圖是只讀的。
運(yùn)行以下兩個(gè)命令：
Create view avg_score as
Select classNo,avg(score) as avgScore from student groupby classNo;
Select max(avgScore) as maScore From avg_score;
可以看到輸出結(jié)果和上例中的結(jié)果是一樣的。

hive 數(shù)據(jù)類型
---基本類型
---復(fù)合類型

網(wǎng)站欄目：hive的基本語(yǔ)法匯總（hql）
文章路徑：http://www.chinadenli.net/article36/jdhjpg.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供Google、網(wǎng)站營(yíng)銷、企業(yè)網(wǎng)站制作、品牌網(wǎng)站制作、標(biāo)簽優(yōu)化、全網(wǎng)營(yíng)銷推廣

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請(qǐng)盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如需處理請(qǐng)聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來(lái)源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

欧美一区二区三区老妇人-欧美做爰猛烈大尺度电-99久久夜色精品国产亚洲a-亚洲福利视频一区二区

hive的基本語(yǔ)法匯總（hql）

2019/2/20 星期三