hadoop中的wordcount示例-CFANZ编程社区

hadoop自带的单词统计示例，使用版本hadoop2.5.1

1.创建文件夹

创建输入和输出的文件夹 input

hdfs dfs -mkdir [-p] <paths>

[root@linuxmain hadoop]# bin/hdfs dfs -mkdir /input

hadoop中的wordcount示例_mapreduce

bin/hadoop jar hadoop-mapreduce-examples-<ver>.jar wordcount -files cachefile.txt -libjars mylib.jar -archives myarchive.zip input output

[root@linuxmain hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount  /input/ /output
14/11/27 19:05:24 INFO client.RMProxy: Connecting to ResourceManager at LinuxMain/192.168.1.216:8032
14/11/27 19:05:24 INFO input.FileInputFormat: Total input paths to process : 1
14/11/27 19:05:24 INFO mapreduce.JobSubmitter: number of splits:1
14/11/27 19:05:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1417139642519_0006
14/11/27 19:05:25 INFO impl.YarnClientImpl: Submitted application application_1417139642519_0006
14/11/27 19:05:25 INFO mapreduce.Job: The url to track the job: http://LinuxMain:8088/proxy/application_1417139642519_0006/
14/11/27 19:05:25 INFO mapreduce.Job: Running job: job_1417139642519_0006
14/11/27 19:05:30 INFO mapreduce.Job: Job job_1417139642519_0006 running in uber mode : false
14/11/27 19:05:30 INFO mapreduce.Job:  map 0% reduce 0%
14/11/27 19:05:39 INFO mapreduce.Job:  map 100% reduce 0%
14/11/27 19:05:44 INFO mapreduce.Job:  map 100% reduce 100%
14/11/27 19:05:44 INFO mapreduce.Job: Job job_1417139642519_0006 completed successfully
14/11/27 19:05:45 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=80
                FILE: Number of bytes written=193839
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=144
                HDFS: Number of bytes written=46
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6643
                Total time spent by all reduces in occupied slots (ms)=2532
                Total time spent by all map tasks (ms)=6643
                Total time spent by all reduce tasks (ms)=2532
                Total vcore-seconds taken by all map tasks=6643
                Total vcore-seconds taken by all reduce tasks=2532
                Total megabyte-seconds taken by all map tasks=6802432
                Total megabyte-seconds taken by all reduce tasks=2592768
        Map-Reduce Framework
                Map input records=3
                Map output records=9
                Map output bytes=80
                Map output materialized bytes=80
                Input split bytes=100
                Combine input records=9
                Combine output records=7
                Reduce input groups=7
                Reduce shuffle bytes=80
                Reduce input records=7
                Reduce output records=7
                Spilled Records=14
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=135
                CPU time spent (ms)=1070
                Physical memory (bytes) snapshot=224751616
                Virtual memory (bytes) snapshot=781852672
                Total committed heap usage (bytes)=137039872
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=44
        File Output Format Counters
                Bytes Written=46
[root@linuxmain hadoop]#

查看一下
[root@linuxmain hadoop]# bin/hdfs dfs -cat /output/*

I       1
hello   2
love    1
world   2
yang    1
you     1
yue     1

出现这个证明成功了你的编译没问题了

将自己的文本文件上传到hadoop的input文件夹中
hdfs dfs -copyFromLocal <localsrc> URI

[root@linuxmain hadoop]# bin/hdfs dfs -copyFromLocal /root/Desktop/new.txt /input

下边这种是如果文件存在则覆盖文件 
[root@linuxmain hadoop]# bin/hdfs dfs -copyFromLocal -f /root/Desktop/new.txt /input

hadoop中的wordcount示例_hadoop中的wordcount示例_02

本人亲测没问题