0
点赞
收藏
分享

微信扫一扫

MapReduce--11--学生成绩(基础版)--需求2


对于刚入门MapReduce的同学来说,学会mapreduce的基本编程套路,懂得mapreduce是如何对于大批量数据集做分布式运算的是非常关键的。

这里有一个需求,增强各位对mapreduce编程的理解

首先看数据:

computer,huangxiaoming,85
computer,xuzheng,54
computer,huangbo,86
computer,liutao,85
computer,huanglei,99
computer,liujialing,85
computer,liuyifei,75
computer,huangdatou,48
computer,huangjiaju,88
computer,huangzitao,85
english,zhaobenshan,57
english,liuyifei,85
english,liuyifei,76
english,huangdatou,48
english,zhouqi,85
english,huangbo,85
english,huangxiaoming,96
english,huanglei,85
english,liujialing,75
algorithm,liuyifei,75
algorithm,huanglei,76
algorithm,huangjiaju,85
algorithm,liutao,85
algorithm,huangdou,42
algorithm,huangzitao,81
math,wangbaoqiang,85
math,huanglei,76
math,huangjiaju,85
math,liutao,48
math,xuzheng,54
math,huangxiaoming,85
math,liujialing,85

以上所有的是数据,该数据每行有三个字段值,分别是course,name,score

现在求需求2:求该成绩表每门课程当中出现了相同分数的分数,还有次数,以及该分数的人数

返回结果的格式:
科目    分数    次数    该分数的人
例子:
computer    85    3    huangzitao,liujialing,huangxiaoming

 

解题思路:

对于mapper阶段,输出的key-value分别是:

key: 课程,分数

value: 名字

对于reducer阶段,reduce方法接收的参数是:

key: 课程,分数

values: 课程中的某个分数的多个学生的名字的迭代器

 

看代码实现:

package com.ghgj.mazh.mapreduce.exercise.coursescore2;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class CourseScoreMR_Basic_02 {

public static void main(String[] args) throws Exception {
/**
* 一些参数的初始化
*/
String inputPath = "D:\\bigdata\\coursescore1\\input";
String outputPath = "D:\\bigdata\\coursescore1\\output2";

/**
* 初始化一个Job对象
*/
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);

/**
* 设置jar包所在路径
*/
job.setJarByClass(CourseScoreMR_Basic_02.class);

/**
* 指定mapper类和reducer类 等各种其他业务逻辑组件
*/
job.setMapperClass(Mapper_CS.class);
job.setReducerClass(Reducer_CS.class);
// 指定maptask的输出类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
// 指定reducetask的输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

/**
* 指定该mapreduce程序数据的输入和输出路径
*/
Path input = new Path(inputPath);
Path output = new Path(outputPath);
FileSystem fs = FileSystem.get(conf);
if (fs.exists(output)) {
fs.delete(output, true);
}
FileInputFormat.setInputPaths(job, input);
FileOutputFormat.setOutputPath(job, output);

/**
* 最后提交任务
*/
boolean waitForCompletion = job.waitForCompletion(true);
System.exit(waitForCompletion ? 0 : 1);
}

/**
* Mapper组件:
* <p>
* 输入的key:
* 输入的value: computer,huangxiaoming,85
* <p>
* 输出的key: course +"\t"+ score
* 输入的value: name
*/
private static class Mapper_CS extends Mapper<LongWritable, Text, Text, Text> {

Text keyOut = new Text();
Text valueOut = new Text();

@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

String[] splits = value.toString().split(",");
String course = splits[0];
String score = splits[2];
String name = splits[1];

keyOut.set(course +"\t"+ score);
valueOut.set(name);

context.write(keyOut, valueOut);
}
}

/**
* Reducer组件:
* <p>
* 输入的key:
* 输入的values:
* <p>
* 输出的key: course + "\t" + score
* 输入的value: number + "\t" + names
*/
private static class Reducer_CS extends Reducer<Text, Text, Text, Text> {

Text valueOut = new Text();

@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

StringBuilder sb = new StringBuilder();
int number = 0;
for(Text t: values){
sb.append(t.toString()).append(",");
number++;
}

if(number > 1){
String names = sb.toString().substring(0, sb.toString().length() - 1);
valueOut.set(number + "\t" + names);
context.write(key, valueOut);
}
}
}
}

 

代码运行得到的结果如下:

algorithm 85  2 liutao,huangjiaju
computer 85 4 liutao,huangzitao,liujialing,huangxiaoming
english 85 4 huangbo,huanglei,zhouqi,liuyifei
math 85 4 wangbaoqiang,huangjiaju,huangxiaoming,liujialing

 

至此,得出需要的结果

举报

相关推荐

0 条评论