0
点赞
收藏
分享

微信扫一扫

HDFS常用命令和客户端API操作


注意

1)记得快照,快照,快照一下

ECS服务器怎么快照? 

2)参考资料(视频)的问题

上面网盘的视频,里面是有教怎么搭建服务器环境和Eclipse客户端环境的,服务器环境搭搭建问题讲的不错,但是客户端环境搭建视频里只讲了win7和win10,也就是说里面的资料是win7和win10的资料(jar文件),如果你想看win8怎么搭建环境,对不起,帮不到你;其他的就没什么问题了

服务器环境搭建(阿里云ECS服务器+伪分布式)

​​通俗易懂,图文并茂的环境搭建​​

HDFS_shell命令

前提:承接上面的搭建环境

显示目录信息(Linux中的ls)

hadoop fs -ls   HDFS中的路径

hadoop fs -ls /

HDFS常用命令和客户端API操作_HDFSAPI

 HDFS中的路径          //递归显示

hadoop fs -ls -R  /

HDFS常用命令和客户端API操作_HDFS常用命令_02

 在HDFS上创建目录

参数-p是创建多级目录

hadoop fs -mkdir -p  /sanguo/shuguo

HDFS常用命令和客户端API操作_HDFS客户端操作_03

从本地剪切到HDFS

剪切,剪切,剪切

将当前本地目录下的panjinlian.txt 上传到HDFS中的/sanguo/shuguo/下

hadoop fs -moveFromLocal ./panjinlian.txt /sanguo/shuguo/

HDFS常用命令和客户端API操作_HDFS客户端操作_04

从本地文件系统中拷贝文件到HDFS路径去

 将当前本地目录下的README.txt 上传到HDFS中的根目录下

hadoop fs -copyFromLocal README.txt /

追加一个文件到已经存在的文件末尾

只支持追加,不支持修改

hadoop fs -appendToFile 本地文件路径  HDFS路径

hadoop fs -appendToFile liubei.txt /sanguo/shuguo/kongming.txt

查看HDFS中文件内容

hadoop fs -cat HDFS路径

hadoop fs -cat /sanguo/shuguo/panjinlian.txt

HDFS常用命令和客户端API操作_HDFS常用命令_05

从HDFS拷贝到本地

hadoop fs -copyToLocal  HDFS路径   本地路径

hadoop fs -copyToLocal /sanguo/shuguo/kongming.txt ./

从HDFS的一个路径拷贝到HDFS的另一个路径

hadoop fs -cp /sanguo/shuguo/kongming.txt /zhuge.txt

在HDFS目录中移动文件

hadoop fs -mv /zhuge.txt /sanguo/shuguo/

get从HDFS下载文件到本地,等同于copyToLocal

hadoop fs -get HDFS路径   Linux系统路径

hadoop fs -get /sanguo/shuguo/kongming.txt ./

合并下载多个文件

比如HDFS的目录 /user/atguigu/test下有多个文件:log.1, log.2,log.3,...合并成zaiyiqi.txt

hadoop fs -getmerge /user/atguigu/test/* ./zaiyiqi.txt

put从本地上传到HDFS,等同于copyFromLocal

hadoop fs -put ./zaiyiqi.txt /user/atguigu/test/

删除HDFS文件

hadoop fs -rm /user/test/jinlian2.txt

多级删除

hadoop fs -rm  -r  /user/test/jinlian2.txt

HDFS客户端(Eclipse等)搭建

这个地方确实有些难,也百度了一些,我的环境搭建是按照视频做的,视频在文章最上面的百度网盘里有

HDFS客户端API操作

前提(重中之重,否则后面会有坑):修改本地host文件

HDFS常用命令和客户端API操作_HDFS_06

 

客户端环境测试

HDFS常用命令和客户端API操作_HDFS客户端操作_07

HDFS常用命令和客户端API操作_HDFS常用命令_08

package com.imooc.hdfs;

import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HdfsClient {

public static void main(String[] args) throws Exception {
// 1 获取文件系统
Configuration configuration = new Configuration();
// 配置在集群上运行(
//URL中的IP地址对应你图一中的IP
//代码中的"root"位置对应图二中箭头的指向
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");

// 2 创建目录
fs.mkdirs(new Path("/20190307/CBeann"));

// 3 关闭资源
fs.close();
System.out.println("--------over---------");
}

}

代码中参数介绍(下面代码中的参数)

configuration.set("fs.defaultFS", "hdfs://a99qngm2v98asii1aZ:8020");

hdfs://某某某:8020

其中某某某为下图

HDFS常用命令和客户端API操作_HDFSAPI_09

 FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");

其中root位置为下图

HDFS常用命令和客户端API操作_HDFS客户端操作_10

copyFromLocalFile:文件上传

public static void main(String[] args) throws Exception {
// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
configuration.set("fs.defaultFS", "hdfs://a99qngm2v98asii1aZ:8020");
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");
// 2 上传文件 参数:原数据路径,目标路径
fs.copyFromLocalFile(new Path("e:/temp/hello.txt"), new Path("/hello.txt"));

// 3 关闭资源
fs.close();
System.out.println("over");
}

copyToLocalFile:文件下载

// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
configuration.set("fs.defaultFS", "hdfs://ea99qngm2v98asii1aZ:8020");
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");

// 2 执行下载操作
// boolean delSrc 指是否将原文件删除
// Path src 指要下载的文件路径
// Path dst 指将文件下载到的路径
// boolean useRawLocalFileSystem 是否开启文件校验
fs.copyToLocalFile(false, new Path("/hello.txt"), new Path("e:/temp/helloword.txt"), true);

// 3 关闭资源
fs.close();

System.out.println("-----------over--------------");

delete:文件删除

// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
configuration.set("fs.defaultFS", "hdfs://a99qngm2v98asii1aZ:8020");
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");

// 2 执行删除
// Path 指要删除的文件路径
// boolean 是否递归删除
fs.delete(new Path("/hello.txt"), true);

// 3 关闭资源
fs.close();

System.out.println("-----------over--------------");

rename:文件名修改

// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
configuration.set("fs.defaultFS", "hdfs://a99qngm2v98asii1aZ:8020");
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");

// 2 修改文件名称
fs.rename(new Path("/hello.txt"), new Path("/helloworld.txt"));

// 3 关闭资源
fs.close();
System.out.println("-----------over--------------");

HDFS文件详情查看

HDFS常用命令和客户端API操作_HDFSAPI_11

HDFS常用命令和客户端API操作_hdfs_12

// 1获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
configuration.set("fs.defaultFS", "hdfs://ea99qngm2v98asii1aZ:8020");
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");

// 2 获取文件详情
RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);

while (listFiles.hasNext()) {
LocatedFileStatus status = listFiles.next();

// 输出详情
// 文件名称
System.out.println("文件名:"+status.getPath().getName());
// 长度
System.out.println("长度:"+status.getLen());
// 权限
System.out.println("权限:"+status.getPermission());
// 分组
System.out.println("分组:"+status.getGroup());

// 获取存储的块信息
BlockLocation[] blockLocations = status.getBlockLocations();
System.out.println("获取存储的块信息:");
for (BlockLocation blockLocation : blockLocations) {

// 获取块存储的主机节点
String[] hosts = blockLocation.getHosts();

for (String host : hosts) {
System.out.println(host);
}
}

System.out.println("-----------分割线----------");
}

// 3 关闭资源
fs.close();

System.out.println("-----over-----");

HDFS文件和文件夹判断

HDFS常用命令和客户端API操作_hdfs_13

HDFS常用命令和客户端API操作_HDFS_14

// 1 获取文件配置信息
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
configuration.set("fs.defaultFS", "hdfs://ea99qngm2v98asii1aZ:8020");
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");

// 2 判断是文件还是文件夹
FileStatus[] listStatus = fs.listStatus(new Path("/"));

for (FileStatus fileStatus : listStatus) {

// 如果是文件
if (fileStatus.isFile()) {
System.out.println("f:" + fileStatus.getPath().getName());
} else {
System.out.println("d:" + fileStatus.getPath().getName());
}
}

// 3 关闭资源
fs.close();
System.out.println("----over----");

HDFS的I/O流操作

HDFS文件上传

需求

把本地e盘上的hello.txt文件上传到HDFS根目录

编写代码

// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
configuration.set("fs.defaultFS", "hdfs://ea99qngm2v98asii1aZ:8020");
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");

// 2 创建输入流
FileInputStream fis = new FileInputStream(new File("e:/temp/hello.txt"));

// 3 获取输出流
FSDataOutputStream fos = fs.create(new Path("/hello.txt"));

// 4 流对拷
IOUtils.copyBytes(fis, fos, configuration);

// 5 关闭资源
IOUtils.closeStream(fos);
IOUtils.closeStream(fis);
fs.close();

System.out.println("over");

HDFS文件下载

需求

从HDFS上下载banhua.txt文件到本地e盘上

编写代码

// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
configuration.set("fs.defaultFS", "hdfs://ea99qngm2v98asii1aZ:8020");
FileSystem fs = FileSystem.get(new URI("hdfs://47.105.133.99:9000"), configuration, "root");
// 2 获取输入流
FSDataInputStream fis = fs.open(new Path("/hello.txt"));

// 3 获取输出流
FileOutputStream fos = new FileOutputStream(new File("e:/temp/helloworld.txt"));

// 4 流的对拷
IOUtils.copyBytes(fis, fos, configuration);

// 5 关闭资源
IOUtils.closeStream(fos);
IOUtils.closeStream(fis);
fs.close();

System.out.println("over");

常见问题

问题1:File /hdfsapi/test/a.txt could only be replicated to 0 nodes instead of minReplication (=1)

解决办法(请,请,请写代码之前修改一下host文件):


HDFS常用命令和客户端API操作_HDFS客户端操作_15

异常详情:

2019-03-09 16:26:29,406 INFO [org.apache.hadoop.hdfs.DFSClient] - Exception in createBlockOutputStream
java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1537)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1313)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
2019-03-09 16:26:29,470 INFO [org.apache.hadoop.hdfs.DFSClient] - Abandoning BP-109356787-172.31.236.96-1547785821831:blk_1073741830_1006
2019-03-09 16:26:29,626 INFO [org.apache.hadoop.hdfs.DFSClient] - Excluding datanode DatanodeInfoWithStorage[172.31.236.96:50010,DS-96fc6538-dec0-4c3b-a8fb-8f73908a3370,DISK]
2019-03-09 16:26:29,809 WARN [org.apache.hadoop.hdfs.DFSClient] - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hello.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hello.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

问题2:未完,待续

参考文献

尚硅谷大数据hadoop视频

百度网盘链接:

链接:https://pan.baidu.com/s/1E6EHfcux4Pnb8YD29tZvXw 
提取码:dd2x 

举报

相关推荐

0 条评论