Hadoop和MapReduce

| 我是HDFS和MapReduce的新手，正在尝试计算调查统计信息。输入文件的格式为：年龄点性别类别-所有这4个数字都是数字。这是正确的开始吗？

    public static class MapClass extends MapReduceBase
    implements Mapper<IntWritable, IntWritable, IntWritable, IntWritable> {
    private final static IntWritable Age = new IntWritable(1) ;
    private IntWritable AgeCount = new IntWritable() ;

    public void map( Text key, Text value,
                    OutputCollector<IntWritable, IntWritable> output,
                    Reporter reporter) throws IOException {
        AgeCount. set(Integer. parseInt(value. toString() ) ) ;
        output. collect(AgeCount, Age) ;
    }
}

我的问题： 1.这是正确的开始吗？ 2.如果我想收集其他属性（例如Sex，Points），是否只需添加另一个output.collect语句？我知道我必须阅读这一行并将其拆分为属性。 3.在上面说实现了Mapper的地方-我使所有4个IntWritable都正确吗？

已邀请:

1 个回复

蓄荣糖些

Mapper界面需要以下顺序的4种类型的参数：Map输入键，Map输入值，Map输出键和Map输出值。在您的情况下，由于您要处理4个整数，其中3个构成您的值而1个构成键，因此将IntWritable用作Map输入键是错误的，而应使用Text。另外，您在MapClass定义中指定的类型与传递给Map函数的类型不匹配。鉴于您正在处理文本文件，因此您的MapClass应该定义如下：

public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, IntWritable>

本质上，您假设每个map调用只有一行文本文件输入，您将其解析为所需的字段并将其转换为map函数中的int。因此，您的map函数将具有以下定义：

public void map(LongWritable key, Text value, OutputCollector<IntWritable, IntWritable> output, Reporter reporter) throws IOException {...}

要回复问题请先登录或注册

Hadoop和MapReduce

1 个回复

发起人

java

hadoop

mapreduce

问题状态

Hadoop和MapReduce

与内容相关的链接

1 个回复

发起人

java

hadoop

mapreduce

问题状态