0
点赞
收藏
分享

微信扫一扫

Hadoop har 归档实战

纽二 2023-07-01 阅读 43
  • 现象:
  • Databus 实时同步任务失败
  • Hadoop har 归档实战_har
  • 报错:
  • Hadoop har 归档实战_har_02
  • 结论:
  • 当前hdfs目录下超过了最大可容纳文件个数,默认是1048576
  • 目录统计

#统计该目录下文件数量
HADOOP_CLIENT_OPTS="-Xmx4096m" hdfs dfs -ls -h /databus_online_class/class/class_stock_relation | wc -l


#查看该目录下最新的10个文件
HADOOP_CLIENT_OPTS="-Xmx4096m" hdfs dfs -ls -h /databus_online_class/class/class_stock_relation | tail -10


#查看该目录被访问的审计日志
HADOOP_CLIENT_OPTS="-Xmx4096m" hdfs dfs -text /ranger/audit/hdfs/202305*/* |grep  '/databus_online_class/class/class_stock_relation'


#跳过垃圾桶,删除该目录下文件
HADOOP_CLIENT_OPTS="-Xmx4096m" hdfs dfs -rm -skipTrash /databus_online_class/class/class_stock_relation/2020*

  • /databus_online_class/class/class_stock_relation
  • count:926133
  • tail :2022121504-20221215041000-9.gz
  • last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
  • /databus_online_class/class/flow_compensate_operate
  • count:1048577(已满)
  • tail :2023020311-20230205194000-0.gz.tmp
  • last visit:有写请求、但失败;无读请求
  • /databus_online_class/class/learning_progress
  • count:1036229
  • tail :2023051214-20230516011000-0.gz.tmp
  • last visit:有写请求、成功;无读请求
  • /databus_online_class/class/online_class
  • count:1048577(已满)
  • tail :2022121506-20221215070000-0.gz
  • last visit:有写请求、但失败;无读请求
  • /databus_online_class/class/online_class_extend
  • count:970881
  • tail :2022121506-20221215070000-0.gz
  • last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
  • /databus_online_class/class/online_class_student
  • count:983171
  • tail :2022121506-20221215070000-0.gz
  • last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
  • /databus_online_class/class/order_compensate_operate
  • count:7128
  • tail :2021080720-20210807204000-7.gz
  • last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
  • /databus_online_class/class/require_class
  • count:614347
  • tail :2022120908-20221209084000-0.gz
  • last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
  • 解决
  • hadoop-achive 归档

#!/bin/bash


year_arr=(2019 2020 2021 2022)
dir_arr=(flow_compensate_operate online_class_extend online_class_student order_compensate_operate require_class)


source_dir=/databus_online_class/class
tmp_dir=/tmp/backup


for dir in ${dir_arr[*]};
do
        for year in ${year_arr[*]};
        do
                echo 'hdfs dfs -mkdir -p '$tmp_dir'/'$dir'/'$year''
                hdfs dfs -mkdir -p $tmp_dir/$dir/$year
                echo 'HADOOP_CLIENT_OPTS="-Xmx20480m" hadoop distcp -m 400 '$source_dir'/'$dir'/'$year'* '$tmp_dir'/'$dir'/'$year'/'            
                HADOOP_CLIENT_OPTS="-Xmx20480m" hadoop distcp -m 400 $source_dir/$dir/$year* $tmp_dir/$dir/$year/
                echo 'HADOOP_CLIENT_OPTS="-Xmx8192m" hadoop archive -archiveName '$year'_history.har -p '$tmp_dir'/'$dir'/'$year' '$tmp_dir'/'$dir''
                HADOOP_CLIENT_OPTS="-Xmx8192m" hadoop archive -archiveName ${year}_history.har -p $tmp_dir/$dir/$year $tmp_dir/$dir
                echo -----------
                sleep 60s
        done
done

举报

相关推荐

0 条评论