> 技术文档 > 大数据开发之Hive篇-Hive函数详解_hive开发

大数据开发之Hive篇-Hive函数详解_hive开发


一.Hive函数概述

1.1 Hive函数分类

  1. 函数/UDF
    输入一行记录,输出一行记录
    示例:upper/lower/length

  2. 聚集函数/UDAF
    输入多行记录,输出一行记录
    示例:sum/count/avg

  3. 表生成函数/UDTF
    输入一行记录,输出多行记录
    示例:explode

1.2 查看Hive函数

1.2.1 show functions命令

show functions命令是查询当前hive支持哪些函数,可以看到hive总共支持248个函数,总体来说函数与MySQL的比较类似,但是也包含 to_date、decode等Oracle独有的函数(函数用法与Oracle存在差别)。真的是给从关系型数据库转到大数据库开发的人提供了便利。

hive> SHOW FUNCTIONS;OK!!=$sum0%&*+-/<<====>>=^absacosadd_monthsandarrayarray_containsasciiasinassert_trueatanavgbase64betweenbinbroundcasecbrtceilceilingchrcoalescecollect_listcollect_setcompute_statsconcatconcat_wscontext_ngramsconvcorrcoscountcovar_popcovar_sampcrc32create_unioncume_distcurrent_databasecurrent_datecurrent_timestampcurrent_userdate_adddate_formatdate_subdatediffdaydayofmonthdayofweekdecodedegreesdense_rankdiveeltencodeewah_bitmapewah_bitmap_andewah_bitmap_emptyewah_bitmap_orexpexplodefactorialfieldfind_in_setfirst_valuefloorfloor_dayfloor_hourfloor_minutefloor_monthfloor_quarterfloor_secondfloor_weekfloor_yearformat_numberfrom_unixtimefrom_utc_timestampget_json_objectget_splitsgreatestgroupinghashhexhistogram_numerichourifinin_fileindexinitcapinlineinstrinternal_intervalisnotnullisnulljava_methodjson_tuplelaglast_daylast_valuelcaseleadleastlengthlevenshteinlikelnlocateloglog10log2logged_in_userlowerlpadltrimmapmap_keysmap_valuesmaskmask_first_nmask_hashmask_last_nmask_show_first_nmask_show_last_nmatchpathmaxmd5minminutemonthmonths_betweennamed_structnegativenext_dayngramsnoopnoopstreamingnoopwithmapnoopwithmapstreamingnotntilenvlorparse_urlparse_url_tuplepercent_rankpercentilepercentile_approxpipmodposexplodepositivepowpowerprintfquarterradiansrandrankreflectreflect2regexpregexp_extractregexp_replacerepeatreplacereverserlikeroundrow_numberrpadrtrimsecondsentencesshasha1sha2shiftleftshiftrightshiftrightunsignedsignsinsizesort_arraysoundexspacesplitsqrtstackstdstddevstddev_popstddev_sampstr_to_mapstructsubstrsubstringsubstring_indexsumtanto_dateto_unix_timestampto_utc_timestamptranslatetrimtruncucaseunbase64unhexunix_timestampupperuuidvar_popvar_sampvarianceversionweekofyearwhenwindowingtablefunctionxpathxpath_booleanxpath_doublexpath_floatxpath_intxpath_longxpath_numberxpath_shortxpath_stringyear|~Time taken: 1.246 seconds, Fetched: 248 row(s)hive> 

1.2.2 DESCRIBE 命令

语法:

DESCRIBE FUNCTION ;DESCRIBE FUNCTION EXTENDED ;

测试记录:

hive> > > desc function decode;OKdecode(bin, str) - Decode the first argument using the second argument character setTime taken: 0.023 seconds, Fetched: 1 row(s)hive> desc function extended decode;OKdecode(bin, str) - Decode the first argument using the second argument character setPossible options for the character set are \'US-ASCII\', \'ISO-8859-1\',\'UTF-8\', \'UTF-16BE\', \'UTF-16LE\', and \'UTF-16\'. If either argumentis null, the result will also be nullTime taken: 0.027 seconds, Fetched: 4 row(s)hive> 

二.Hive的数值函数

下列是Hive常见的数值函数:

函数名 描述 返回值类型 round(DOUBLE a)
round(DOUBLE a, INT d) 四舍五入函数 DOUBLE floor(DOUBLE a) 返回小于等于a的整数 BIGINT ceil(DOUBLE a), ceiling(DOUBLE a) 返回大于等于a的整数 BIGINT rand(), rand(INT seed) 返回0 到1 之间的随机数 DOUBLE pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p) 返回a的p次方 DOUBLE sqrt(DOUBLE a), sqrt(DECIMAL a) 开方函数 DOUBLE bin(BIGINT a) 返回二进制数 STRING abs(DOUBLE a) 返回a的绝对值 DOUBLE greatest(T v1, T v2, ...) 返回最大值 T least(T v1, T v2, ...) 返回最小值 T

2.1 round函数

round函数,用于四舍五入。

测试记录:

hive> > select round(5.1);OK5.0Time taken: 0.09 seconds, Fetched: 1 row(s)hive> select round(5.5);OK6.0Time taken: 0.07 seconds, Fetched: 1 row(s)hive> > select round(5.5,1);OK5.5Time taken: 0.086 seconds, Fetched: 1 row(s)

2.2 floor函数

floor向下取整

测试记录:

hive> > > select floor(5.4);OK5Time taken: 0.092 seconds, Fetched: 1 row(s)hive> select floor(5.5);OK5Time taken: 0.074 seconds, Fetched: 1 row(s)hive> select floor(5.9);OK5Time taken: 0.08 seconds, Fetched: 1 row(s)hive> 

2.3 ceil函数

ceil 向上取整

测试记录:

hive> > select ceil(5.4);OK6Time taken: 0.094 seconds, Fetched: 1 row(s)hive> select ceil(5.1);OK6Time taken: 0.063 seconds, Fetched: 1 row(s)hive> select ceil(5.9);OK6Time taken: 0.08 seconds, Fetched: 1 row(s)

2.4 rand()

rand 随机数函数

测试记录:

-- 取值1-100之间的随机数hive> > select round(100*rand());Query ID = root_20201208110850_c85b86e9-012e-449d-a1f5-0b158741315bTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0141, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0141/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0141Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-08 11:08:57,146 Stage-1 map = 0%, reduce = 0%2020-12-08 11:09:03,404 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.04 secMapReduce Total cumulative CPU time: 3 seconds 40 msecEnded Job = job_1606698967173_0141MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 3.04 sec HDFS Read: 5300 HDFS Write: 104 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 3 seconds 40 msecOK62.0Time taken: 14.362 seconds, Fetched: 1 row(s)

2.5 power函数

power平方函数

测试记录:

-- 求2的3次方hive> > select power(2,3);OK8.0Time taken: 0.087 seconds, Fetched: 1 row(s)

2.6 sqrt函数

sqrt开方函数

测试记录:

hive> > select sqrt(9);OK3.0Time taken: 0.074 seconds, Fetched: 1 row(s)

2.7 bin

bin函数,返回二进制数

测试记录:

hive> > > select bin(10);OK1010Time taken: 0.085 seconds, Fetched: 1 row(s)hive> select bin(9);OK1001Time taken: 0.065 seconds, Fetched: 1 row(s)hive> 

2.8 abs函数

abc函数求绝对值

测试记录:

hive> > select abs(10);OK10Time taken: 0.09 seconds, Fetched: 1 row(s)hive> select abs(-10);OK10Time taken: 0.084 seconds, Fetched: 1 row(s)

2.9 greatest函数

greatest函数,求最大值函数,如果有参数为null,就会返回null

测试记录:

hive> > select greatest(1,2,3);OK3Time taken: 0.094 seconds, Fetched: 1 row(s)hive> select greatest(1,2,3,1.5);OK3.0Time taken: 0.075 seconds, Fetched: 1 row(s)hive> > select greatest(1,2,3,null);OKNULLTime taken: 0.084 seconds, Fetched: 1 row(s)

2.10 least函数

least函数求最小值,如参数中有null值,则返回null

测试记录:

hive> > > select least(1,2,3);OK1Time taken: 0.094 seconds, Fetched: 1 row(s)hive> select least(1,2,3,-1);OK-1Time taken: 0.079 seconds, Fetched: 1 row(s)hive> select least(1,2,3,null);OKNULLTime taken: 0.059 seconds, Fetched: 1 row(s)hive> 

三.Hive的日期时间函数

下列是Hive常见的日期时间函数:

函数名 描述 返回值类型 from_unixtime(bigint unixtime[, string format]) 与UNIX_TIMESTAMP()结合使用,数值转换成日期时间类型 string unix_timestamp()
unix_timestamp(string date)
unix_timestamp(string date, string pattern) 将时间转成数字,与FROM_UNIXTIME相互对应 bigint to_date(string timestamp) 返回时间戳类型的日期时间部分 date year(string date) 返回年份 int quarter(date/timestamp/string) 返回季度 int month(string date) 返回月份 int day(string date) dayofmonth(date) 返回日 int hour(string date) 返回小时 int minute(string date) 返回分钟 int second(string date) 返回秒 int weekofyear(string date) 返回当年的星期数 int extract(field FROM source) 提取日期的部分 int datediff(string enddate, string startdate) 返回两个日期间隔的天数 int date_add(date/timestamp/string startdate, tinyint/smallint/int days) 增加日期天数 date date_sub(date/timestamp/string startdate, tinyint/smallint/int days) 减去日期天数 date from_utc_timestamp({any primitive type} ts, string timezone) 将UTC中的时间戳*转换为给定时区 timestamp to_utc_timestamp({any primitive type} ts, string timezone) 将给定时区中的时间戳*转换为UTC timestamp current_date 返回当前日期 date current_timestamp 返回当前时间 timestamp add_months(string start_date, int num_months, output_date_format) 增加月份 string last_day(string date) 求本月最后一天 string next_day(string start_date, string day_of_week) 返回比start_date晚并命名为day_of_week的第一个日期 string trunc(string date, string format) 返回格式指定的单元截断的日期 string months_between(date1, date2) 两个日期之间间隔的月份 double date_format(date/timestamp/string ts, string fmt) 格式日期指定 string

3.1 from_unixtime和UNIX_TIMESTAMP

from_unixtime 将数字转成时间,与UNIX_TIMESTAMP相互对应
UNIX_TIMESTAMP将时间转成数字,与FROM_UNIXTIME相互对应
测试记录:

hive> > select UNIX_TIMESTAMP(\'2020-12-09 23:50:59\');OK1607529059Time taken: 0.101 seconds, Fetched: 1 row(s)hive> > select from_unixtime(1607529059);OK2020-12-09 23:50:59Time taken: 0.084 seconds, Fetched: 1 row(s)

3.2 to_date函数

to_date(string timestamp) 函数 (string timestamp)

hive> > > select to_date(\'2020-12-09 23:59:59 123456789\');OK2020-12-09Time taken: 0.109 seconds, Fetched: 1 row(s)hive> select to_date(\'2020-12-09 23:59:59\');OK2020-12-09Time taken: 0.086 seconds, Fetched: 1 row(s)

3.3 year/quarter/month/day/hour/minute/second/weekofyear

  1. year(string date) 返回年份
  2. quarter(date/timestamp/string) 返回季度
  3. month(string date) 返回月份
  4. day(string date) dayofmonth(date) 返回日
  5. hour(string date) 返回小时
  6. minute(string date) 返回分钟
  7. second(string date) 返回秒

测试记录:

hive> > select year(dt) dt_year, > quarter(dt) dt_quarter, > month(dt) dt_month, > day(dt) dt_day, > hour(dt) dt_hour, > minute(dt) dt_minute, > second(dt) dt_second > from > ( > select \'2020-12-09 23:50:59\' as dt > ) tmp;OKdt_year dt_quarter dt_month dt_day dt_hour dt_minute dt_second2020 4 12 9 23 50 59Time taken: 0.098 seconds, Fetched: 1 row(s)hive> 

3.4 weekofyear 函数

weekofyear(string date) 返回当年的星期数

测试记录:
\'2020-12-09\'是本年的第50周

hive> > select weekofyear(\'2020-12-09\');OK_c050Time taken: 0.073 seconds, Fetched: 1 row(s)

3.5 extract 函数

extract(field FROM source) 提取日期的部分
代码:

-- 提取年SELECT EXTRACT(YEAR FROM \'2020-12-09\');-- 提取月SELECT EXTRACT(MONTH FROM \'2020-12-09\');-- 提取日SELECT EXTRACT(DAY FROM \'2020-12-09\');-- 提取小时SELECT EXTRACT(HOUR FROM \'2020-12-09 13:02:01\');-- 提取分钟SELECT EXTRACT(MINUTE FROM \'2020-12-09 13:02:01\');-- 提取秒SELECT EXTRACT(SECOND FROM \'2020-12-09 13:02:01\');

测试记录:

hive> > SELECT EXTRACT(YEAR FROM \'2020-12-09\');OK_c02020Time taken: 1.85 seconds, Fetched: 1 row(s)hive> SELECT EXTRACT(MONTH FROM \'2020-12-09\');OK_c012Time taken: 0.227 seconds, Fetched: 1 row(s)hive> SELECT EXTRACT(DAY FROM \'2020-12-09\');OK_c09Time taken: 0.089 seconds, Fetched: 1 row(s)hive> SELECT EXTRACT(HOUR FROM \'2020-12-09 13:02:01\');OK_c013Time taken: 0.097 seconds, Fetched: 1 row(s)hive> SELECT EXTRACT(MINUTE FROM \'2020-12-09 13:02:01\');OK_c02Time taken: 0.081 seconds, Fetched: 1 row(s)hive> SELECT EXTRACT(SECOND FROM \'2020-12-09 13:02:01\');OK_c01

3.6 datediff 函数

datediff(string enddate, string startdate) 返回两个日期间隔的天数

测试记录:

hive> > > select datediff(\'2020-12-09\',\'2020-11-09\');OK_c030Time taken: 0.101 seconds, Fetched: 1 row(s)hive> select datediff(\'2020-12-09 23:50:59\',\'2020-11-09\');OK_c030Time taken: 0.113 seconds, Fetched: 1 row(s)

3.6 date_add/date_sub 函数

date_add(date/timestamp/string startdate, tinyint/smallint/int days) 增加日期间隔,默认为天
date_sub(date/timestamp/string startdate, tinyint/smallint/int days) 减去日期间隔,默认为天

测试记录:

hive> > > SELECT DATE_ADD(\'2020-01-01\', 31);OK_c02020-02-01Time taken: 0.098 seconds, Fetched: 1 row(s)hive> SELECT DATE_ADD(\'2020-01-01\', -31);OK_c02019-12-01Time taken: 0.075 seconds, Fetched: 1 row(s)hive> > SELECT DATE_SUB(\'2020-01-01\', 31);OK_c02019-12-01Time taken: 0.075 seconds, Fetched: 1 row(s)hive> SELECT DATE_SUB(\'2020-01-01\', -31);OK_c02020-02-01Time taken: 0.071 seconds, Fetched: 1 row(s)

3.7 from_utc_timestamp/to_utc_timestamp 函数

from_utc_timestamp({any primitive type} ts, string timezone) 将UTC中的时间戳转换为给定时区
to_utc_timestamp({any primitive type} ts, string timezone) 将给定时区中的时间戳
转换为UTC

测试记录:
CST是美国时间简写
UTC的

hive> > select from_utc_timestamp(\'2020-12-09 23:59:59\',\'CST\');OK_c02020-12-09 17:59:59Time taken: 0.078 seconds, Fetched: 1 row(s)hive> > select to_utc_timestamp(\'2020-12-09 23:59:59\',\'CST\');OK_c02020-12-10 05:59:59Time taken: 0.085 seconds, Fetched: 1 row(s)

3.8 current_date/current_timestamp

current_date 返回当前日期
current_timestamp 返回当前时间

测试记录:

hive> > select current_date() cur_date,current_timestamp() cur_timestamp;OKcur_date cur_timestamp2020-12-09 2020-12-09 14:58:44.535Time taken: 0.079 seconds, Fetched: 1 row(s)

3.9 add_months 函数

add_months(string start_date, int num_months, output_date_format) 增加月份

测试记录:

hive> > > select add_months(\'2020-12-31 23:59:59\',2,\'YYYY-MM-dd HH:mm:ss\');OK_c02021-02-28 23:59:59Time taken: 0.097 seconds, Fetched: 1 row(s)

3.10 last_day 函数

last_day(string date) 求本月最后一天

测试记录:

hive> > select last_day(\'2020-12-09\');OK_c02020-12-31Time taken: 0.104 seconds, Fetched: 1 row(s)hive> select last_day(\'2020-12-09 23:59:59\');OK_c02020-12-31Time taken: 0.077 seconds, Fetched: 1 row(s)

3.10 next_day函数

next_day(string start_date, string day_of_week) 返回比start_date晚并命名为day_of_week的第一个日期

测试记录:
求下一个礼拜二

hive> > > select next_day(\'2020-12-09\',\'TU\');OK_c02020-12-15Time taken: 0.086 seconds, Fetched: 1 row(s)

3.11 trunc 函数

trunc(string date, string format) 返回格式指定的单元截断的日期
支持格式:MONTH/MON/MM, YEAR/YYYY/YY

测试记录:

hive> > select trunc(\'2020-12-09 23:59:59\',\'YYYY\');OK_c02020-01-01hive> > select trunc(\'2020-12-09 23:59:59\',\'MM\');OK_c02020-12-01Time taken: 0.063 seconds, Fetched: 1 row(s)

3.12 months_between 函数

months_between(date1, date2) 两个日期之间间隔的月份

date1和date2之间的月份间隔,会有小数的存在,另外,如果date1小于date2,那么就会出现负数

测试记录:

hive> > > select months_between(\'2020-12-09\',\'2020-01-01\');OK_c011.25806452Time taken: 0.09 seconds, Fetched: 1 row(s)hive> > select months_between(\'2020-12-09\',\'2021-01-01\');OK_c0-0.74193548Time taken: 0.083 seconds, Fetched: 1 row(s)hive> 

3.13 date_format 函数

date_format(date/timestamp/string ts, string fmt) 格式日期指定

测试记录:

hive> SELECT date_format(\'2020-12-09 23:59:59\', \'yyyy.MM.dd HH:mm:ss\');OK_c02020.12.09 23:59:59Time taken: 0.071 seconds, Fetched: 1 row(s)

四.条件函数

下列是Hive常见的条件函数:

函数名 描述 返回值类型 nvl(a,b) 当a为null的时候返回b,否则返回a T isnull( a ) 当a为null的时候返回true,否则返回false boolean isnotnull ( a ) 当a为null的时候返回false,否则返回true boolean if(boolean testCondition, T valueTrue, T valueFalseOrNull) 当testCondition为true 返回 valueTrue,否则返回 valueFalseOrNull) T COALESCE(T v1, T v2, ...) 返回第一个不为null,当所有的都是null,就返回null T CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END case条件判断语句 T CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END case值判断语句 T

4.1 nvl函数

nvl(a,b) 当a为null的时候返回b,否则返回a

测试记录:

hive> > select nvl(null,123);OK_c0123hive> > select nvl(\'345\',123);OK_c0345Time taken: 0.067 seconds, Fetched: 1 row(s)

4.2 isnull/isnotnull 函数

isnull( a ) 当a为null的时候返回true,否则返回false
isnotnull ( a ) 当a为null的时候返回false,否则返回true

测试记录:

hive> > select isnull(1);OK_c0falseTime taken: 0.083 seconds, Fetched: 1 row(s)hive> select isnull(null);OK_c0trueTime taken: 0.072 seconds, Fetched: 1 row(s)hive> select isnull(\'\');OK_c0falseTime taken: 0.076 seconds, Fetched: 1 row(s)hive> > select isnull(\' \');OK_c0falseTime taken: 0.08 seconds, Fetched: 1 row(s)hive> > > select isnotnull(1);OK_c0trueTime taken: 0.081 seconds, Fetched: 1 row(s)hive> > select isnotnull(null);OK_c0falseTime taken: 0.065 seconds, Fetched: 1 row(s)hive> > select isnotnull(\'\');OK_c0trueTime taken: 0.071 seconds, Fetched: 1 row(s)hive> > select isnotnull(\' \');OK_c0trueTime taken: 0.086 seconds, Fetched: 1 row(s)hive> 

4.3 if函数

if(boolean testCondition, T valueTrue, T valueFalseOrNull) 当testCondition为true 返回 valueTrue,否则返回 valueFalseOrNull)

测试记录:

hive> > select if(1>2,\'abc\',\'def\');OK_c0defTime taken: 0.134 seconds, Fetched: 1 row(s)hive> > select if(1<2,\'abc\',\'def\');OK_c0abcTime taken: 0.1 seconds, Fetched: 1 row(s)

4.4 COALESCE函数

COALESCE(T v1, T v2, ...) 返回第一个不为null,当所有的都是null,就返回null

测试记录:

hive> > select coalesce(\'a\',\'b\',null);OK_c0aTime taken: 0.081 seconds, Fetched: 1 row(s)hive> > select coalesce(\'null\',\'b\',\'c\');OK_c0nullTime taken: 0.073 seconds, Fetched: 1 row(s)hive> > select coalesce(\'null\',null,null);OK_c0nullTime taken: 0.073 seconds, Fetched: 1 row(s)

4.5 CASE 函数

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END case条件判断语句
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END case值判断语句

代码:

-- 值判断,根据职位英文名判断职位中文名select e.empno, e.job, case when e.job = \'SALESMAN\' then \'销售员\' when e.job = \'MANAGER\' then \'管理人员\' else \'未知\' end as job_chinafrom emp e;-- 条件判断,小于某个时间点入职的是老员工,大于等于的是新员工select e.empno, e.hiredate, case e.job when \'SALESMAN\' then \'销售员\' when \'MANAGER\' then \'管理人员\'  else \'未知\' end as job_chinafrom emp e;-- 没有else语句且判断不成功的时候,会返回为空SELECT CASE \'c\' WHEN \'a\' THEN 1 WHEN \'b\' THEN 2 END is_c;

测试记录:

hive> > -- 值判断,根据职位英文名判断职位中文名 > select e.empno, > e.job, > case when e.job = \'SALESMAN\' then \'销售员\' >  when e.job = \'MANAGER\' then \'管理人员\' >  else \'未知\' end as job_china > from emp e;Query ID = root_20201209163913_c6290550-dfa0-4f82-891e-24673ce9c704Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0142, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0142/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0142Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 02020-12-09 16:39:21,669 Stage-1 map = 0%, reduce = 0%2020-12-09 16:39:27,895 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.68 secMapReduce Total cumulative CPU time: 6 seconds 680 msecEnded Job = job_1606698967173_0142MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Cumulative CPU: 6.68 sec HDFS Read: 12384 HDFS Write: 764 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 680 msecOKe.empno e.job job_china7499 SALESMAN 销售员7876 CLERK 未知7934 CLERK 未知7521 SALESMAN 销售员7654 SALESMAN 销售员7698 MANAGER 管理人员7788 ANALYST 未知7369 CLERK 未知7900 CLERK 未知7902 ANALYST 未知7566 MANAGER 管理人员7782 MANAGER 管理人员7839 PRESIDENT 未知7844 SALESMAN 销售员Time taken: 15.553 seconds, Fetched: 14 row(s)hive> > -- 条件判断,小于某个时间点入职的是老员工,大于等于的是新员工 > select e.empno, > e.hiredate, > case e.job when \'SALESMAN\' then \'销售员\' >  when \'MANAGER\' then \'管理人员\' >  else \'未知\' > end as job_china > from emp e;Query ID = root_20201209164150_7030d7f3-686c-4bb8-85e7-4f4e1db2dfefTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0143, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0143/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0143Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 02020-12-09 16:41:57,812 Stage-1 map = 0%, reduce = 0%2020-12-09 16:42:05,052 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.23 secMapReduce Total cumulative CPU time: 6 seconds 230 msecEnded Job = job_1606698967173_0143MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Cumulative CPU: 6.23 sec HDFS Read: 12252 HDFS Write: 808 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 230 msecOKe.empno e.hiredate job_china7499 1981-02-20 销售员7876 1987-06-13 未知7934 1982-01-23 未知7521 1981-02-22 销售员7654 1981-09-28 销售员7698 1981-05-01 管理人员7788 1987-06-13 未知7369 1980-12-17 未知7900 1981-12-03 未知7902 1981-12-03 未知7566 1981-04-02 管理人员7782 1981-06-09 管理人员7839 1981-11-17 未知7844 1981-09-08 销售员Time taken: 15.383 seconds, Fetched: 14 row(s)hive> -- 没有else语句且判断不成功的时候,会返回为空hive> SELECT CASE \'c\' > WHEN \'a\' THEN 1 WHEN \'b\' THEN 2 END is_c;OKis_cNULLTime taken: 0.082 seconds, Fetched: 1 row(s)hive> 

五.字符函数

下列是Hive常见的字符函数:

函数名 描述 返回值类型 ascii(string str) 返回字符的ascii值 int chr(bigint or double A) 通过ascii值返回对应字符 string concat(string or binary A, string or binary B...) 字符连接函数 string concat_ws(string SEP, string A, string B...) 字符连接函数,可以加连接符 string decode(binary bin, string charset) 将二进制文件转字符集 string elt(N int,str1 string,str2 string,str3 string,...) 返回指定的字符 string field(val T,val1 T,val2 T,val3 T,...) 返回字符在参数中出现的位置 int find_in_set(string str, string strList) 返回字符在字符串中出现的位置 int instr(string str, string substr) 返回substr在str中出现的位置 int length(string A) 字符串长度长度函数 int locate(string substr, string str[, int pos]) 返回在位置pos后的str中substr第一次出现的位置。 int lower(string A) lcase(string A) 将字符转为小写 string lpad(string str, int len, string pad) 字符填充函数,向左填充 string ltrim(string A) 截断字符串左边的空格 string repeat(string str, int n) 字符str重复n次 string replace(string A, string OLD, string NEW) 字符串替换函数 string reverse(string A) 字符反转函数 string rpad(string str, int len, string pad) 字符填充函数,从右边填充 string rtrim(string A) 截断字符串右边的空格 string space(int n) 空格函数,返回n个空格 string substr(string or binary A, int start, int len)
substring(string or binary A, int start, int len) 字符截取函数 string translate(string or char or varchar input, string or char or varchar from, string or char or varchar to) 同postgresql的translate函数 string trim(string A) 截断字符串左边和右边的空格 string upper(string A) ucase(string A) 字符串大写函数 string

5.1 ascii值函数

ascii(string str) 返回字符的ascii值
chr(bigint or double A) 通过ascii值返回对应字符

这两个函数刚好是相对应的。

测试记录:

hive> > select ascii(\'a\') ascii_a,ascii(\'A\') ascii_a;OKascii_a ascii_a97 65Time taken: 0.104 seconds, Fetched: 1 row(s)hive> > select chr(97) chr_97,chr(65) chr_65;OKchr_97 chr_65a ATime taken: 0.075 seconds, Fetched: 1 row(s)

5.2 字符连接函数

concat(string or binary A, string or binary B...) 字符连接函数
concat_ws(string SEP, string A, string B...) 字符连接函数,可以加连接符

测试记录:

hive> > > select concat(\'A\',\'b\',\'DEF\');OK_c0AbDEFTime taken: 0.214 seconds, Fetched: 1 row(s)hive> > select concat_ws(\'-\',\'A\',\'b\',\'DEF\');OK_c0A-b-DEFTime taken: 0.063 seconds, Fetched: 1 row(s)hive> 

5.3 大小写函数

lower(string A) lcase(string A) 将字符转为小写
upper(string A) ucase(string A) 字符串大写函数

测试记录:

hive> > select lower(\'AbCdE\'),upper(\'AbCdE\');OK_c0 _c1abcde ABCDETime taken: 0.085 seconds, Fetched: 1 row(s)

5.4 字符填充函数

lpad(string str, int len, string pad) 字符填充函数,向左填充
rpad(string str, int len, string pad) 字符填充函数,从右边填充

测试记录:

hive> > SELECT LPAD(\'hi\',4,\'??\');OK_c0??hiTime taken: 0.065 seconds, Fetched: 1 row(s)hive> SELECT LPAD(\'hi\',1,\'??\');OK_c0hTime taken: 0.072 seconds, Fetched: 1 row(s)hive> SELECT RPAD(\'hi\',4,\'??\');OK_c0hi??Time taken: 0.064 seconds, Fetched: 1 row(s)hive> SELECT RPAD(\'hi\',1,\'??\');OK_c0hTime taken: 0.08 seconds, Fetched: 1 row(s)

5.5 空格函数

LTRIM() 左边去掉空格
TRIM() 去掉空格
RTRIM() 右边去掉空格
SPACE() 返回指定长度的空格

如果中间有空格需要去掉,可以使用replace

代码:

-- 去掉左边、两端、右边的空格select ltrim(\' abc def \'),trim(\' abc def \'),rtrim(\' abc def \');-- 返回10个空格select SPACE(10);

测试记录:

hive> select ltrim(\' abc def \'),trim(\' abc def \'),rtrim(\' abc def \');OK_c0 _c1 _c2abc def abc def abc defTime taken: 0.077 seconds, Fetched: 1 row(s)hive> select SPACE(10);OK_c0 Time taken: 0.062 seconds, Fetched: 1 row(s)

5.6 字符长度函数

length(string A) 字符串长度长度函数

测试记录:

hive> > select length(\'aaaaaabbbbcccc\');OK_c014Time taken: 0.081 seconds, Fetched: 1 row(s)

5.7 字符截取函数

SUBSTR() 截取字符
SUBSTRING_INDEX() 截取字符
INSTR() 字符出现位置函数

语法:
substr(string or binary A, int start, int len)
substring(string or binary A, int start, int len)
INSTR(str,substr)
locate(string substr, string str[, int pos])

代码:

-- 字符截取,从第5位开始截,len没有值,表示截取到最后select substr(\'abcdefghi\',5);-- 字符截取,从第5位开始截,截取2个字符select substr(\'abcdefghi\',5,2);-- 字符截取,从第-5位开始截,截取2个字符select substr(\'abcdefghi\',-5,2);-- \'.\'第一次出现的地方,保留左边的select substring_index(\'www.mysql.com\',\'.\',1);-- \'.\'最后一次出现的地方,保留右边的select substring_index(\'www.mysql.com\',\'.\',-1);-- \'.\'出现的位置select instr(\'www.mysql.com\',\'.\');-- \'.\'在第6个字符之后第一次出现的位置select locate(\'.\',\'hive.apache.org\',6);

测试记录:

hive> > > select substr(\'abcdefghi\',5);OK_c0efghiTime taken: 0.091 seconds, Fetched: 1 row(s)hive> select substr(\'abcdefghi\',5,2);OK_c0efTime taken: 0.066 seconds, Fetched: 1 row(s)hive> select substr(\'abcdefghi\',-5,2);OK_c0efTime taken: 0.064 seconds, Fetched: 1 row(s)hive> select substring_index(\'www.mysql.com\',\'.\',1);OK_c0wwwTime taken: 0.062 seconds, Fetched: 1 row(s)hive> select substring_index(\'www.mysql.com\',\'.\',-1);OK_c0comTime taken: 0.062 seconds, Fetched: 1 row(s)hive> select instr(\'www.mysql.com\',\'.\');OK_c04Time taken: 0.062 seconds, Fetched: 1 row(s)hive> select locate(\'.\',\'hive.apache.org\',6);OK_c012Time taken: 0.067 seconds, Fetched: 1 row(s)hive> 

5.8 字符替换函数

replace(string A, string OLD, string NEW) 字符串替换函数
translate(string or char or varchar input, string or char or varchar from, string or char or varchar to) 同postgresql的translate函数

replace 会把string OLD当做一个整体来进行替换
translate 会把string or char or varchar from拆分成一个一个的字符,与string to一个一个对应进行替换

translate比replace功能更强大
translate(‘123abc’,‘2dc’,‘4e’): 将会把2->4,d->e,c-’’
translate(‘123abc’,‘2’,‘4e’): 将会把2->4

测试记录:

hive> > SELECT REPLACE(\'acdd\',\'cd\',\'ef\');OK_c0aefdTime taken: 0.093 seconds, Fetched: 1 row(s)hive> SELECT TRANSLATE(\'acdd\',\'cd\',\'ef\');OK_c0aeffTime taken: 0.072 seconds, Fetched: 1 row(s)hive> select translate(\'123abc\',\'2dc\',\'4e\');OK_c0143abTime taken: 0.073 seconds, Fetched: 1 row(s)hive> select translate(\'123abc\',\'2\',\'4e\');OK_c0143abcTime taken: 0.075 seconds, Fetched: 1 row(s)hive> 

5.9 其它函数

decode(binary bin, string charset) 将二进制文件转字符集
elt(N int,str1 string,str2 string,str3 string,...) 返回指定的字符
field(val T,val1 T,val2 T,val3 T,...) 返回字符在参数中出现的位置
repeat(string str, int n) 字符str重复n次
reverse(string A) 字符反转函数

从Oracle开发转过来的同学看到decode这个函数会觉得特别亲切,不过还是得把Hive的decode语法看一下,这两者完全不是一回事,这个decode是转字符集的。

测试记录:

hive> > select reflect(\'java.net.URLDecoder\', \'decode\',\'1\' , \"GBK\");Query ID = root_20201209172130_cafe84ad-3e66-4e4a-92a0-3868efd52957Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0145, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0145/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0145Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-09 17:21:39,319 Stage-1 map = 0%, reduce = 0%2020-12-09 17:21:45,570 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.88 secMapReduce Total cumulative CPU time: 2 seconds 880 msecEnded Job = job_1606698967173_0145MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.88 sec HDFS Read: 4726 HDFS Write: 101 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 880 msecOK_c01Time taken: 15.797 seconds, Fetched: 1 row(s)hive> > select elt(3,\'abc\',\'aaa\',\'def\',\'test\');OK_c0defTime taken: 0.084 seconds, Fetched: 1 row(s)hive> > select field(\'ab\',\'abc\',\'ba\',\'ab\',\'dab\');OK_c03hive> > > select repeat(\'abc\',10);OK_c0abcabcabcabcabcabcabcabcabcabcTime taken: 0.085 seconds, Fetched: 1 row(s)hive> > select reverse(\'abc\');OK_c0cbaTime taken: 0.078 seconds, Fetched: 1 row(s)

六.UDTF

UDTF 是表生成函数

这点与关系型数据库的区别比较大,下面通过几个例子来看看

代码:

-- explode (array)select explode(array(\'A\',\'B\',\'C\'));select explode(array(\'A\',\'B\',\'C\')) as col;select tf.* from (select 0) t lateral view explode(array(\'A\',\'B\',\'C\')) tf;select tf.* from (select 0) t lateral view explode(array(\'A\',\'B\',\'C\')) tf as col;-- explode (map)select explode(map(\'A\',10,\'B\',20,\'C\',30));select explode(map(\'A\',10,\'B\',20,\'C\',30)) as (key,value);select tf.* from (select 0) t lateral view explode(map(\'A\',10,\'B\',20,\'C\',30)) tf;select tf.* from (select 0) t lateral view explode(map(\'A\',10,\'B\',20,\'C\',30)) tf as key,value;

测试记录:

hive> > > > select explode(array(\'A\',\'B\',\'C\'));Query ID = root_20201210161223_3527486d-76e5-4306-b3aa-8521f89774d9Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0150, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0150/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0150Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-10 16:12:30,620 Stage-1 map = 0%, reduce = 0%2020-12-10 16:12:37,839 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.92 secMapReduce Total cumulative CPU time: 2 seconds 920 msecEnded Job = job_1606698967173_0150MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.92 sec HDFS Read: 5007 HDFS Write: 129 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 920 msecOKcolABCTime taken: 15.401 seconds, Fetched: 3 row(s)hive> select explode(array(\'A\',\'B\',\'C\')) as col;Query ID = root_20201210161249_acc85662-e686-4176-a5e8-032ed0992a45Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0151, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0151/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0151Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-10 16:12:56,243 Stage-1 map = 0%, reduce = 0%2020-12-10 16:13:02,434 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.8 secMapReduce Total cumulative CPU time: 2 seconds 800 msecEnded Job = job_1606698967173_0151MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.8 sec HDFS Read: 5007 HDFS Write: 129 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 800 msecOKcolABCTime taken: 14.35 seconds, Fetched: 3 row(s)hive> select tf.* from (select 0) t lateral view explode(array(\'A\',\'B\',\'C\')) tf;Query ID = root_20201210161305_5012fe10-4599-4ffe-838f-b867d212a10aTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0152, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0152/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0152Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-10 16:13:14,529 Stage-1 map = 0%, reduce = 0%2020-12-10 16:13:20,716 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.86 secMapReduce Total cumulative CPU time: 2 seconds 860 msecEnded Job = job_1606698967173_0152MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.86 sec HDFS Read: 5764 HDFS Write: 129 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 860 msecOKtf.colABCTime taken: 16.663 seconds, Fetched: 3 row(s)hive> select tf.* from (select 0) t lateral view explode(array(\'A\',\'B\',\'C\')) tf as col;Query ID = root_20201210161324_8f1bcaa5-931d-49c1-8dbd-5da936bfa67fTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0153, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0153/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0153Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-10 16:13:32,968 Stage-1 map = 0%, reduce = 0%2020-12-10 16:13:40,179 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.99 secMapReduce Total cumulative CPU time: 2 seconds 990 msecEnded Job = job_1606698967173_0153MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.99 sec HDFS Read: 5778 HDFS Write: 129 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 990 msecOKtf.colABCTime taken: 17.102 seconds, Fetched: 3 row(s)hive> select explode(map(\'A\',10,\'B\',20,\'C\',30));Query ID = root_20201210161354_1c522046-cac8-4182-9e28-9789a17c1a98Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0154, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0154/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0154Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-10 16:14:00,978 Stage-1 map = 0%, reduce = 0%2020-12-10 16:14:08,199 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.96 secMapReduce Total cumulative CPU time: 2 seconds 960 msecEnded Job = job_1606698967173_0154MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.96 sec HDFS Read: 5144 HDFS Write: 138 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 960 msecOKkey valueA 10B 20C 30Time taken: 15.114 seconds, Fetched: 3 row(s)hive> select explode(map(\'A\',10,\'B\',20,\'C\',30)) as (key,value);Query ID = root_20201210161411_59b615ac-2733-40c2-bec4-1a17f593140fTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0155, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0155/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0155Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-10 16:14:19,606 Stage-1 map = 0%, reduce = 0%2020-12-10 16:14:25,805 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.95 secMapReduce Total cumulative CPU time: 2 seconds 950 msecEnded Job = job_1606698967173_0155MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.95 sec HDFS Read: 5144 HDFS Write: 138 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 950 msecOKkey valueA 10B 20C 30Time taken: 15.767 seconds, Fetched: 3 row(s)hive> select tf.* from (select 0) t lateral view explode(map(\'A\',10,\'B\',20,\'C\',30)) tf;Query ID = root_20201210161428_5b45482b-39c0-4a6f-9e67-ec640e03d816Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0156, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0156/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0156Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-10 16:14:38,792 Stage-1 map = 0%, reduce = 0%2020-12-10 16:14:46,098 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.92 secMapReduce Total cumulative CPU time: 2 seconds 920 msecEnded Job = job_1606698967173_0156MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.92 sec HDFS Read: 5987 HDFS Write: 138 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 920 msecOKtf.key tf.valueA 10B 20C 30Time taken: 18.456 seconds, Fetched: 3 row(s)hive> select tf.* from (select 0) t lateral view explode(map(\'A\',10,\'B\',20,\'C\',30)) tf;Query ID = root_20201210161448_4ebe123f-57b7-4869-9e0e-557cccb27a10Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there\'s no reduce operatorStarting Job = job_1606698967173_0157, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0157/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0157Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02020-12-10 16:14:57,112 Stage-1 map = 0%, reduce = 0%2020-12-10 16:15:03,299 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.74 secMapReduce Total cumulative CPU time: 2 seconds 740 msecEnded Job = job_1606698967173_0157MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.74 sec HDFS Read: 5987 HDFS Write: 138 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 740 msecOKtf.key tf.valueA 10B 20C 30Time taken: 16.725 seconds, Fetched: 3 row(s)hive> 

参考