hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Yeung <brye...@gmail.com>
Subject Re: map and reduce with different value classes
Date Tue, 17 Apr 2012 03:55:24 GMT
Oh no!  I just figured it out :-/

It's actually dying on the Combine step, with an error about the
Reduce class (because the WordCount example uses the Reduce class as
both the reducer and combiner).

This makes sense now.

Sorry for the silly question, and thanks for the help!

Bryan

On Mon, Apr 16, 2012 at 11:36 PM, Bejoy Ks <bejoy.hadoop@gmail.com> wrote:
> Hi Bryan
>
>     Can you post in the error stack trace?
>
> Regards
> Bejoy KS
>
> On Tue, Apr 17, 2012 at 8:41 AM, Bryan Yeung <bryeung@gmail.com> wrote:
>> Hello Bejoy,
>>
>> Thanks for your reply.
>>
>> Isn't that exactly what I've done with my modifications to
>> WordCount.java?  Could you have a look at the diff I supplied and/or
>> the WordCount.java file I attached and tell me how I've deviated from
>> what you say below?
>>
>> Thanks,
>>
>> Bryan
>>
>> On Mon, Apr 16, 2012 at 11:03 PM, Bejoy Ks <bejoy.hadoop@gmail.com> wrote:
>>> HI Bryan
>>>      You can set different key and value types with the following steps
>>> - ensure that the map output key value type is the reducer input key value type
>>> - specify it on your Driver Class as
>>>
>>> //set map output key value types
>>> job.setMapOutputKeyClass(theClass)
>>>   job.setMapOutputValueClass(theClass)
>>>
>>> //set final/reduce output key value types
>>>         job.setOutputKeyClass(Text.class);
>>>         job.setOutputValueClass(IntWritable.class)
>>>
>>> If both map output and reduce output key value types are the same you
>>> just need to specify the final output types.
>>>
>>> Regards
>>> Bejoy KS
>>>
>>>
>>>
>>>
>>> On Tue, Apr 17, 2012 at 7:14 AM, Bryan Yeung <bryeung@gmail.com> wrote:
>>>>
>>>> Hello Everyone,
>>>>
>>>> I'm relatively new to hadoop mapreduce and I'm trying to get this
>>>> simple modification to the WordCount example to work.
>>>>
>>>> I'm using hadoop-1.0.2, and I've included both a convenient diff and
>>>> also attached my new WordCount.java file.
>>>>
>>>> The thing I am trying to achieve is to have the value class that is
>>>> output by the map phase be different than the value class output by
>>>> the reduce phase.
>>>>
>>>> Any help would be greatly appreciated!
>>>>
>>>> Thanks,
>>>>
>>>> Bryan
>>>>
>>>> diff --git a/WordCount.java.orig b/WordCount.java
>>>> index 81a6c21..6a768f7 100644
>>>> --- a/WordCount.java.orig
>>>> +++ b/WordCount.java
>>>> @@ -33,8 +33,8 @@ public class WordCount {
>>>>   }
>>>>
>>>>   public static class IntSumReducer
>>>> -       extends Reducer<Text,IntWritable,Text,IntWritable> {
>>>> -    private IntWritable result = new IntWritable();
>>>> +       extends Reducer<Text,IntWritable,Text,Text> {
>>>> +    private Text result = new Text();
>>>>
>>>>     public void reduce(Text key, Iterable<IntWritable> values,
>>>>                        Context context
>>>> @@ -43,7 +43,7 @@ public class WordCount {
>>>>       for (IntWritable val : values) {
>>>>         sum += val.get();
>>>>       }
>>>> -      result.set(sum);
>>>> +      result.set("" + sum);
>>>>       context.write(key, result);
>>>>     }
>>>>   }
>>>> @@ -58,10 +58,11 @@ public class WordCount {
>>>>     Job job = new Job(conf, "word count");
>>>>     job.setJarByClass(WordCount.class);
>>>>     job.setMapperClass(TokenizerMapper.class);
>>>> +       job.setMapOutputValueClass(IntWritable.class);
>>>>     job.setCombinerClass(IntSumReducer.class);
>>>>     job.setReducerClass(IntSumReducer.class);
>>>>     job.setOutputKeyClass(Text.class);
>>>> -    job.setOutputValueClass(IntWritable.class);
>>>> +    job.setOutputValueClass(Text.class);
>>>>     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>>>     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
>>>>     System.exit(job.waitForCompletion(true) ? 0 : 1);

Mime
View raw message