Windows安装Hadoop3.x及在Windows环境下本地开发
下载安装
官网:https://hadoop.apache.org/
访问:https://archive.apache.org/dist/hadoop/common/
下载hadoop.tar.gz并解压到指定目录
访问https://github.com/cdarlint/winutils
选择合适版本对应的winutils.exe和hadoop.dll
将winutils.exe和hadoop.dll
拷贝到Hadoop/bin
目录下和 C:\Windows\System32
目录下,最后重启电脑。
配置环境变量
HADOOP_HOME:D:\Development\Hadoop
HADOOP_USER_NAME:root
Path:%HADOOP_HOME%\bin;%HADOOP_HOME%\sbin;
配置Hadoop
检查adoop-env.cmd
文件JDK的配置,通常无需改动
set JAVA_HOME=%JAVA_HOME%
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\Development\Hadoop\data\tmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>D:\Development\Hadoop\data\namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>D:\Development\Hadoop\data\datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
</configuration>
格式化NameNode : hdfs namenode -format
2022-04-15 21:21:54,046 INFO snapshot.SnapshotManager: SkipList is disabled
2022-04-15 21:21:54,063 INFO util.GSet: Computing capacity for map cachedBlocks
2022-04-15 21:21:54,063 INFO util.GSet: VM type = 64-bit
2022-04-15 21:21:54,064 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
2022-04-15 21:21:54,064 INFO util.GSet: capacity = 2^18 = 262144 entries
2022-04-15 21:21:54,108 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2022-04-15 21:21:54,109 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2022-04-15 21:21:54,109 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2022-04-15 21:21:54,133 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2022-04-15 21:21:54,133 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2022-04-15 21:21:54,139 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2022-04-15 21:21:54,139 INFO util.GSet: VM type = 64-bit
2022-04-15 21:21:54,140 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
2022-04-15 21:21:54,140 INFO util.GSet: capacity = 2^15 = 32768 entries
2022-04-15 21:22:03,246 INFO namenode.FSImage: Allocated new BlockPoolId: BP-9220273-192.168.179.1-1650028923233
2022-04-15 21:22:03,275 INFO common.Storage: Storage directory D:\Development\Hadoop\data\namenode has been successfully formatted.
2022-04-15 21:22:03,330 INFO namenode.FSImageFormatProtobuf: Saving image file D:\Development\Hadoop\data\namenode\current\fsimage.ckpt_0000000000000000000 using no compression
2022-04-15 21:22:03,560 INFO namenode.FSImageFormatProtobuf: Image file D:\Development\Hadoop\data\namenode\current\fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds .
2022-04-15 21:22:03,602 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2022-04-15 21:22:03,616 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Coding/192.168.179.1
************************************************************/
启动hadoop集群
进入Hadoop解压目录的sbin目录,执行start-all
,将启动以下组件
jps
查看进程
D:\Development\Hadoop\sbin>jps
10016 DataNode
12592 NodeManager
13748 ResourceManager
8904 NameNode
1436 Jps
访问测试
访问http://localhost:9870
访问http://localhost:8088/cluster
Windows本地开发
添加依赖
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13</version>
<scope>test</scope>
</dependency>
</dependencies>
上传文件到HDFS
上传wordcount.txt
文件到HDFS
创建Job
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountJob {
public static void main(String[] args) throws Exception {
//获取配置文件
Configuration configuration = new Configuration(true);
//本地模式运行
configuration.set("mapreduce.framework.name", "local");
//创建任务
Job job = Job.getInstance(configuration);
//设置任务主类
job.setJarByClass(WordCountJob.class);
//设置任务
job.setJobName("wordcount-" + System.currentTimeMillis());
//设置Reduce的数量
job.setNumReduceTasks(2);
//设置数据的输入路径
FileInputFormat.setInputPaths(job, new Path("/wordcount/wordcount.txt"));
//设置数据的输出路径
FileOutputFormat.setOutputPath(job, new Path("/wordcount/wordcount_" + System.currentTimeMillis()));
//设置Map的输入的key和value类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//设置Map和Reduce的处理类
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
//提交任务
job.waitForCompletion(true);
}
}
创建Mapper
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
//创建对象
private IntWritable one = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String valueString = value.toString();
//切分字符串
String[] values = valueString.split(" ");
//向context添加数据
for (String val : values) {
context.write(new Text(val), one);
}
}
}
创建Reducer
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.Iterator;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
//获取迭代器对象
Iterator<IntWritable> iterator = values.iterator();
// 相同单词计数累加
int count = 0;
while (iterator.hasNext()) {
count += iterator.next().get();
}
//输出数据
context.write(key, new IntWritable(count));
}
}
添加配置文件
在资源resources目录,添加Hadoop相关配置文件
yarn-site.xml
core-site.xml
hdfs-site.xml
mapred-site.xml
执行Job