利用身份证号去判断性别(18位身份证的倒数第二位偶数为女,奇数为男。15位身份证的倒数第一位偶数为女,奇数为男)
HQL实现代码:
SELECT name,
CASE
WHEN LENGTH(id) = 18 THEN
CASE
WHEN SUBSTR(id,-2,1) % 2 = 0 THEN "女"
WHEN SUBSTR(id,-2,1) % 2 = 1 THEN "男"
ELSE "Unknown" END
WHEN LENGTH(id) = 15 THEN
CASE
WHEN SUBSTR(id,-1,1) % 2 = 0 THEN "女"
WHEN SUBSTR(id,-1,1) % 2 = 1 THEN "男"
ELSE "Unknown" END
ELSE "非法" END AS "性别"
FROM idcarda.idcard;
结果如下:
Python实现UDF编写
# -*- coding: utf-8 -*-
import sys
for line in sys.stdin:
detail = line.strip().split("\t")
if len(detail) != 2:
continue
else:
name = detail[0]
idcard = detail[1]
if len(idcard) == 15:
if int(idcard[-1]) % 2 == 0:
print("\t".join([name,idcard,"女"]))
else:
print("\t".join([name,idcard,"男"]))
elif len(idcard) == 18:
if int(idcard[-2]) % 2 == 0:
print("\t".join([name,idcard,"女"]))
else:
print("\t".join([name,idcard,"男"]))
else:
print("\t".join([name,idcard,"身份信息不合法!"]))
这里使用python的重定向,将hive控制台的输出进行split,split默认使用的为\t.然后根据split后的idcard的倒数第二位进行判断这个人的性别
报错的提示不是很详细.我们可以使用cat指令去测试python脚本的执行效果.
我们在终端中执行如下指令:
cat person.txt|python person.py
结果如下:
neil 411325199308110030 男
pony 41132519950911004x 女
jack 12312423454556561 身份信息不合法!
tony 123124234545565 男