0
点赞
收藏
分享

微信扫一扫

Hive内嵌字符处理函数:regexp_extract,regexp_replace,split,replace,translate


1.Hive内嵌函数说明

        官方地址:​​hive函数大全官方地址​​

       Hive字符处理函数系列1:​​hive内嵌字符串函数1​​

String Functions


Return Type



Name(Signature)



Description





regexp_extract(string subject, string pattern, int index)



Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method..

抽取字符串subject中符合正则表达式pattern的第index个部分的子字符串,注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格

hive>   select   regexp_extract('foothebar', 'foo(.*?)(bar)', 2)   ---注意2是匹配分组的

             'bar.'



regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)


Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc..

按照Java正则表达式PATTERN将字符串INTIAL_STRING中符合条件的部分成REPLACEMENT所指定的字符串,如里REPLACEMENT这空的话,抽符合正则的部分将被去掉  如:regexp_replace("foobar", "oo|ar", "") = 'fb.' 注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格


string

replace(string A, string OLD, string NEW)

Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of ​​Hive 1.3.0 and 2.1.0​​). Example: select replace("ababab", "abab", "Z"); returns "Zab".


split(string str, string pat)


Splits str around pat (pat is a regular expression)..

按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回



string



translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)



Translates the input string by replacing the characters present in the ​​from​​ string with the corresponding characters in the ​​to​​ string. This is similar to the ​​translate​​ function in ​​PostgreSQL​​. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive ​​0.10.0​​, for string types)

Char/varchar support added as of ​​Hive 0.14.0​​.

解释:

hive>select translate('abcdefga','abc','wo')

       wodefgw   注意,结果不是wodefga,

hive>select replace('abcdefga','abc','wo')

       wodefga   注意,两个结果,这就是replace和translate的区别。






map<string,string>



str_to_map(text[, delimiter1, delimiter2])



Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2.

解释:

使用两个分隔符将文本拆分为键值对。

分隔符1将文本分成K-V对,分隔符2分割每个K-V对。对于分隔符1默认分隔符是 ',',对于分隔符2默认分隔符是 '='

hive> select 
str_to_map(concat(path_id,':',filter_name ))   ---是map类型的
from FDM_SOR.T_FIBA_MULTI_UBA_CFG_PATH_DETAIL_D 
group by path_id,filter_Name 


hive>


2.split,regexp_replace,regexp_extract的使用

2.1 split函数,支持使用正则表达式对字符串进行切割,返回值为数组

SELECT 
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [0] AS A,
--对url进行使用?进行切割,返回值是数组,这里取?前面的值。
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [1] AS A1,
SPLIT('http://facebook.com/index.html','\\?') [0] AS B ,
SPLIT('http://facebook.com/index.html','\\?') [1] AS B1,
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\\w\\/') [0] AS C1
FROM FDM_SOR.T_PLPLFIS_TB_LOAN_APPLY_CEBANK_ED

Hive内嵌字符处理函数:regexp_extract,regexp_replace,split,replace,translate_Hive函数

 注意:所有正则表达式中的预定义字符比如?,},|等需要在这里用\\进行反转义才能表达本意。比如正则表达式中\w表示匹配字母,所以也属于预定义字符,单独的\w表示匹配的是字母w,而\\\w才表示匹配字母。

2.2. regexp_replace函数,比较简单,难的是里面参数正则表达式的书写。

select 
case when regexp_replace(uniscid,'[0-9A-HJ-NPQRTUWXY]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}','~~fbietl~~') = '~~fbietl~~' then uniscid
else null end uniscid,
from fdm_sor.aaaaaaaaaaaaaaa;

2.3 replace和translate的区别


  1. replace:字符串级别的代替
  2. translate:字符级别的代替 ​​hive>select translate('abcdefga','abc','wo') wodefgw 注意,结果不是wodefga, hive>select replace('abcdefga','abc','wo') wodefga 注意,两个结果,这就是replace和translate的区别。​​ 



举报

相关推荐

MySQL正则表达式regexp_replace函数

0 条评论