0
点赞
收藏
分享

微信扫一扫

yarn ——集群节点丢失,重启后也连不上可用节点

左小米z 2022-04-08 阅读 66
yarnhadoop

解决步骤:

查看页面发现可用资源全显示0,下面图片是解决后的。

解决步骤,查了很多网上资料,有说关闭yarn.nodemanager.vmem-check-enabled 关闭线程检查内存。试过之后不好使。

后来去监控hadoop的log日志

tailf hadoop-root-nodemanager-craw-node212.log

tailf hadoop-root-resourcemanager-craw-node212.log

发现 hadoop-root-nodemanager-craw-node212.log 打印信息如下:

2022-04-08 14:00:50,775 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir error, used space above threshold of 90.0%, removing from list of valid directories
2022-04-08 14:00:50,775 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /home/data/software/hadoop-3.2.2/logs/userlogs error, used space above threshold of 90.0%, removing from list of valid directories
2022-04-08 14:00:50,776 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed: 1/1 local-dirs usable space is below configured utilization percentage/no more usable space [ /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir : used space above threshold of 90.0% ] ; 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /home/data/software/hadoop-3.2.2/logs/userlogs : used space above threshold of 90.0% ] 
2022-04-08 14:00:50,776 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs usable space is below configured utilization percentage/no more usable space [ /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir : used space above threshold of 90.0% ] ; 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /home/data/software/hadoop-3.2.2/logs/userlogs : used space above threshold of 90.0% ] 
2022-04-08 14:00:50,797 INFO org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl:  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.ResourceCalculatorPlugin@54e041a4
2022-04-08 14:00:50,798 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
2022-04-08 14:00:50,800 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService
2022-04-08 14:00:50,800 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: AMRMProxyService is disabled

因为nodemanager检测到本地磁盘使用量超过90%。

解决办法:

1 把节点上的不用的东西删完,删到90%以下即可

2 在yarn-site.xml中添加以下配置信息,修改上限和下限

  <property>
     <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
     <value>0.0</value>
  </property>
  <property>
     <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
     <value>100.0</value>
 </property>

此外: 上面报错会引起下面报错信息:解决上面的问题即可。

2022-04-08 14:06:51,033 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Could not carry out resource dir checks for /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir, which was marked as good
java.io.FileNotFoundException: File /home/data/software/hadoop-3.2.2/data/tmp/nm-local-dir/filecache does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:668)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:989)
举报

相关推荐

0 条评论