Eclipse调用hadoop2运行MR程序-CFANZ编程社区

hadoop：hadoop2.2 ，windows myeclipse环境；

Eclipse调用hadoop运行MR程序其实就是普通的java程序可以提交MR任务到集群执行而已。在Hadoop1中，只需指定jt（jobtracker）和fs（namenode）即可，一般如下：

1. Configuration conf = new Configuration();  
2. conf.set("mapred.job.tracker", "192.168.128.138:9001");  
3. conf.set("fs.default.name","192.168.128.138:9000");

上面的代码在hadoop1中运行是ok的，完全可以使用java提交任务到集群运行。但是，hadoop2却是没有了jt，新增了yarn。这个要如何使用呢？最简单的想法，同样指定其配置，试试。

1. Configuration conf = new YarnConfiguration();  
2. "fs.defaultFS", "hdfs://node31:9000");  
3. "mapreduce.framework.name", "yarn");  
4. "yarn.resourcemanager.address", "node31:8032");

恩，这样配置后，可以运行，首先是下面的错误：

1. 2014-04-03 21:20:21,568 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(303)) - Failed to locate the winutils binary in the hadoop binary path  
2. java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.  
3.     at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)  
4.     at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)  
5. <clinit>(Shell.java:293)  
6. <clinit>(StringUtils.java:76)  
7. <clinit>(YarnConfiguration.java:345)  
8.     at org.fansy.hadoop.mr.WordCount.getConf(WordCount.java:104)  
9.     at org.fansy.hadoop.mr.WordCount.runJob(WordCount.java:84)  
10.     at org.fansy.hadoop.mr.WordCount.main(WordCount.java:47)

这个错误不用管，这个好像是windows调用的时候就会出的错误。

然后是什么权限问题之类的，这个时候就需要去调整下权限，至少我目前是这样做的。调整的权限主要有/tmp 以及运行wordcount的输入、输出目录。命令如下： $HADOOP_HOME/bin/hadoop fs -chmod -R 777 /tmp 。

然后直到你出现了下面的错误，那么，好了，可以说你已经成功了一半了。

1. 2014-04-03 20:32:36,596 ERROR [main] security.UserGroupInformation (UserGroupInformation.java:doAs(1494)) - PriviledgedActionException as:Administrator (auth:SIMPLE) cause:java.io.IOException: Failed to run job : Application application_1396459813671_0001 failed 2 times due to AM Container for appattempt_1396459813671_0001_000002 exited with  exitCode: 1 due to: Exception from container-launch:   
2. org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control  
3.   
4. 464)  
5. 379)  
6. 589)  
7. 195)  
8. 283)  
9. 79)  
10. 334)  
11. 166)  
12. 1145)  
13. 615)  
14. 724)  
15.   
16.   
17. .Failing this attempt.. Failing the application.

用上面出现的错误去google，可以得到这个网页：https://issues.apache.org/jira/browse/MAPREDUCE-5655 。恩，对的。这个网页就是我们的solution。

我们分为1、2、3步骤吧。

1. 修改MRapps.java 、YARNRunner.java的源码，然后打包替换原来的jar包中的相应class文件，这两个jar我已经打包，可以在这里下载。然后替换集群中相应的jar吧，同时需要注意替换Myeclipse中导入的包。额，说起Myeclipse中的jar包，这里还是先上幅jar包的图吧：

Eclipse调用hadoop2运行MR程序_java

Eclipse调用hadoop2运行MR程序_java_02

2. 修改mapred-default.xml ,添加：（这个只需在eclipse中导入的jar包修改即可，修改后的jar包不用上传到集群）

1. <property>  
2. <name>mapred.remote.os</name>  
3. <value>Linux</value>  
4. <description>  
5.         Remote MapReduce framework's OS, can be either Linux or Windows  
6. </description>  
7. </property>

（题外话，添加了这个属性后，按说我new一个Configuration后，我使用conf.get("mapred.remote.os")的时候应该是可以得到Linux的，但是我得到的却是null，这个就不清楚是怎么了。）

其文件在：

Eclipse调用hadoop2运行MR程序_Hadoop_03

这时，你再运行程序，额好吧程序基本可以提交了，但是还是报错，查看log，可以看到下面的错误：

1. Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

额，说了这么久，还是把我的wordcount程序贴出来吧：

1. package org.fansy.hadoop.mr;  
2.   
3. import java.io.IOException;  
4.   
5. import org.apache.hadoop.conf.Configuration;  
6. import org.apache.hadoop.fs.FileSystem;  
7. import org.apache.hadoop.fs.LocatedFileStatus;  
8. import org.apache.hadoop.fs.Path;  
9. import org.apache.hadoop.fs.RemoteIterator;  
10. import org.apache.hadoop.io.LongWritable;  
11. import org.apache.hadoop.io.Text;  
12. import org.apache.hadoop.mapred.ClusterStatus;  
13. import org.apache.hadoop.mapred.JobClient;  
14. import org.apache.hadoop.mapreduce.Job;  
15. import org.apache.hadoop.mapreduce.Mapper;  
16. import org.apache.hadoop.mapreduce.Reducer;  
17. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
18. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
19. import org.apache.hadoop.yarn.conf.YarnConfiguration;  
20. import org.slf4j.Logger;  
21. import org.slf4j.LoggerFactory;  
22.   
23. public class WordCount {  
24. private static Logger log = LoggerFactory.getLogger(WordCount.class);  
25. public static class WCMapper extends  Mapper<LongWritable, Text, LongWritable, Text> {  
26.             
27. public void map(LongWritable key, Text value, Context cxt) throws IOException,InterruptedException {  
28. // String[] values= value.toString().split("[,| ]");  
29.            cxt.write(key, value);  
30.           }  
31.        }  
32.           
33. public static class WCReducer extends  Reducer<LongWritable, Text, LongWritable,Text> {  
34. public void reduce(LongWritable key, Iterable<Text> values, Context cxt) throws IOException,InterruptedException {  
35. new StringBuffer();  
36. for (Text v:values) {  
37. "\t");  
38.                }  
39. new Text(buff.toString()));  
40.             }  
41.         }  
42.   
43. public static void main(String[] args) throws Exception {  
44. //    checkFS();  
45. "hdfs://node31:9000/input/test.dat";  
46. "hdfs://node31:9000/output/wc003";  
47.             runJob(input,output);  
48. //  runJob(args[0],args[1]);  
49. //  upload();  
50.         }  
51.           
52. /**
53.          * test operate the hdfs
54.          * @throws IOException 
55.          */  
56. public static void checkFS() throws IOException{  
57.             Configuration conf=getConf();   
58. new Path("/user");  
59.             FileSystem fs = FileSystem.get(f.toUri(),conf);  
60.               
61. true);  
62. while(paths.hasNext()){  
63.                 System.out.println(paths.next());  
64.             }  
65.               
66.         }  
67.           
68. public static void upload() throws IOException{  
69.             Configuration conf = getConf();  
70. new Path("d:\\wordcount.jar");  
71.             FileSystem fs = FileSystem.get(f.toUri(),conf);  
72. true, f, new Path("/input/wordcount.jar"));  
73. "done ...");  
74.         }  
75.           
76. /**
77.          *  test the job submit
78.          * @throws IOException
79.          * @throws InterruptedException 
80.          * @throws ClassNotFoundException 
81.          */  
82. public static void runJob(String input,String output) throws IOException, ClassNotFoundException, InterruptedException{  
83.               
84.               Configuration conf=getConf();   
85. new Job(conf,"word count");  
86. //    job.setJar("hdfs://node31:9000/input/wordcount.jar");  
87. "wordcount");  
88. class);  
89. //  job.setOutputFormatClass(SequenceFileOutputFormat.class);  
90. class);  
91. class);  
92.       
93. class);  
94. class);  
95. class);  
96.       
97. new Path(input));  
98. //  SequenceFileOutputFormat.setOutputPath(job, new Path(args[1]));  
99. new Path(output));  
100. true)?0:1);  
101.         }  
102.           
103. private static Configuration getConf() throws IOException{  
104. new YarnConfiguration();  
105. "fs.defaultFS", "hdfs://node31:9000");  
106. "mapreduce.framework.name", "yarn");  
107. "yarn.resourcemanager.address", "node31:8032");  
108. //    conf.set("mapred.remote.os", "Linux");  
109. "mapred.remote.os"));  
110. //    JobClient client = new JobClient(conf);  
111. //    ClusterStatus cluster = client.getClusterStatus();  
112. return conf;  
113.         }  
114. }

3. 如何修复上面的报错？按照那个链接的solution，需要修改mapred-default.xml 和yarn-default.xml ，其中mapred-default.xml刚才已经修改过了，这次再次修改，添加：

1. <property>  
2.     <name>mapreduce.application.classpath</name>  
3.     <value>  
4.         $HADOOP_CONF_DIR,  
5.         $HADOOP_COMMON_HOME/share/hadoop/common/*,  
6.         $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,  
7.         $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,  
8.         $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,  
9.         $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,  
10.         $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,  
11.         $HADOOP_YARN_HOME/share/hadoop/yarn/*,  
12.         $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*  
13.     </value>  
14. </property>

对于yarn-default.xml也是同样的修改，其在hadoop-yarn-common-2.2.0.jar包中，修改内容如下：

1. <property>  
2.     <name>mapreduce.application.classpath</name>  
3.     <value>  
4.         $HADOOP_CONF_DIR,  
5.         $HADOOP_COMMON_HOME/share/hadoop/common/*,  
6.         $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,  
7.         $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,  
8.         $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,  
9.         $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,  
10.         $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,  
11.         $HADOOP_YARN_HOME/share/hadoop/yarn/*,  
12.         $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*  
13.     </value>  
14.   </property>

同样的，上面两个jar包只用替换myeclipse中的jar包即可，不需要替换集群中的。

4. 经过上面的替换，然后再次运行，出现下面的错误：

1. Caused by: java.lang.ClassNotFoundException: Class org.fansy.hadoop.mr.WordCount$WCMapper not found  
2.     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)  
3.     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)  
4.     ... 8 more

额，好吧，我应该不用多少了，这样的错误，应该已经说明我们的myeclipse可以提交任务到hadoop2了，并且可以运行了。好吧最后一步，上传我们打包的wordcount程序的jar文件到$HADOOP_HOME/share/hadoop/mapreduce/lib下面，然后再次运行。（这里上传后不用重启集群）呵呵，最后得到下面的结果：

1. 2014-04-03 21:17:34,289 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(303)) - Failed to locate the winutils binary in the hadoop binary path  
2. java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.  
3.     at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)  
4.     at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)  
5. <clinit>(Shell.java:293)  
6. <clinit>(StringUtils.java:76)  
7. <clinit>(YarnConfiguration.java:345)  
8.     at org.fansy.hadoop.mr.WordCount.getConf(WordCount.java:104)  
9.     at org.fansy.hadoop.mr.WordCount.runJob(WordCount.java:84)  
10.     at org.fansy.hadoop.mr.WordCount.main(WordCount.java:47)  
11. Linux  
12. 2014-04-03 21:18:19,853 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
13. 2014-04-03 21:18:20,499 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(56)) - Connecting to ResourceManager at node31/192.168.0.31:8032  
14. 2014-04-03 21:18:20,973 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(149)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.  
15. 2014-04-03 21:18:21,020 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 1  
16. 2014-04-03 21:18:21,313 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(394)) - number of splits:1  
17. 2014-04-03 21:18:21,336 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - user.name is deprecated. Instead, use mapreduce.job.user.name  
18. 2014-04-03 21:18:21,337 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.jar is deprecated. Instead, use mapreduce.job.jar  
19. 2014-04-03 21:18:21,337 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - fs.default.name is deprecated. Instead, use fs.defaultFS  
20. 2014-04-03 21:18:21,338 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class  
21. 2014-04-03 21:18:21,338 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class  
22. 2014-04-03 21:18:21,339 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class  
23. 2014-04-03 21:18:21,339 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.job.name is deprecated. Instead, use mapreduce.job.name  
24. 2014-04-03 21:18:21,339 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class  
25. 2014-04-03 21:18:21,340 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir  
26. 2014-04-03 21:18:21,340 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir  
27. 2014-04-03 21:18:21,342 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps  
28. 2014-04-03 21:18:21,343 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class  
29. 2014-04-03 21:18:21,343 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir  
30. 2014-04-03 21:18:21,513 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(477)) - Submitting tokens for job: job_1396463733942_0003  
31. 2014-04-03 21:18:21,817 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(174)) - Submitted application application_1396463733942_0003 to ResourceManager at node31/192.168.0.31:8032  
32. 2014-04-03 21:18:21,859 INFO  [main] mapreduce.Job (Job.java:submit(1272)) - The url to track the job: http://node31:8088/proxy/application_1396463733942_0003/  
33. 2014-04-03 21:18:21,860 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1317)) - Running job: job_1396463733942_0003  
34. 2014-04-03 21:18:31,307 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1338)) - Job job_1396463733942_0003 running in uber mode : false  
35. 2014-04-03 21:18:31,311 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) -  map 0% reduce 0%  
36. 2014-04-03 21:19:02,346 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) -  map 100% reduce 0%  
37. 2014-04-03 21:19:11,416 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) -  map 100% reduce 100%  
38. 2014-04-03 21:19:11,425 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1356)) - Job job_1396463733942_0003 completed successfully  
39. 2014-04-03 21:19:11,552 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1363)) - Counters: 43  
40.     File System Counters  
41. read=11139  
42. written=182249  
43. operations=0  
44. operations=0  
45. operations=0  
46. read=8646  
47. written=10161  
48. operations=6  
49. operations=0  
50. operations=2  
51.     Job Counters   
52. tasks=1  
53. tasks=1  
54. tasks=1  
55.         Total time spent by all maps in occupied slots (ms)=29330  
56.         Total time spent by all reduces in occupied slots (ms)=5825  
57.     Map-Reduce Framework  
58. records=235  
59. records=235  
60. bytes=10428  
61. bytes=11139  
62. bytes=98  
63. records=235  
64. records=235  
65. groups=235  
66. bytes=11139  
67. records=235  
68. records=235  
69. Records=470  
70. Maps =1  
71. Shuffles=0  
72. outputs=1  
73.         GC time elapsed (ms)=124  
74.         CPU time spent (ms)=21920  
75. snapshot=299376640  
76. snapshot=1671372800  
77.         Total committed heap usage (bytes)=152834048  
78.     Shuffle Errors  
79. BAD_ID=0  
80. CONNECTION=0  
81. IO_ERROR=0  
82. WRONG_LENGTH=0  
83. WRONG_MAP=0  
84. WRONG_REDUCE=0  
85.     File Input Format Counters   
86. Read=8548  
87.     File Output Format Counters   
88. Written=10161

上面你看到Linux，是因为我使用了conf.set("mapred.remote.os", "Linux"); 不过在实际运行的时候却不需要设置。

另外，如果是linux系统部署的tomcat调用hadoop2集群运行MR程序的话，应该不需要替换其jar吧的，这个还有待验证。

哈，总算搞定了。这个问题也算是困扰了我好久了，期间几次想要冲破，结果都是无果而归，甚是郁闷。额，其实这个也不算是原创了，哎，国外在02/Dec/13 18:35这个时间点就搞定了。不过，我搜了好久，都没有中文的相关介绍。（如果有的话，那就是我搜索能力的问题了，居然没有搜到，哎）。