hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bejoy.had...@gmail.com
Subject Re: mapreduce linear chaining: ClassCastException
Date Sat, 15 Oct 2011 08:08:12 GMT
    I believe what is happening in your case is that.
The first map reduce jobs runs to completion
When you trigger the second map reduce job, it is triggered with the default input format,
TextInputFormat and definitely expects the key value as LongWritable and Text type. In default
the MapReduce jobs output format is TextOutputFormat, key value as tab seperated. When you
need to consume this output of an MR job  as key value pairs by another MR job, use KeyValueInputFormat,
ie while setting config parameters for second job set
jobConf.setInputFormat(KeyValueInput Format.class).
Now if your output key value pairs use a different separator other than default tab then for
second job you need to specify that as well using key.value.separator.in.input.line 

In short for your case in second map reduce job doing the following would get things in place
-use jobConf.setInputFormat(KeyValueInputFormat.class)
-alter your mapper to accept key values of type Text,Text
-swap the key and values within mapper for output to reducer with conversions.

To be noted here,AFAIK KeyValueInputFormat is not a part of new mapreduce API.

Hope it helps.

Bejoy K S

-----Original Message-----
From: "Periya.Data" <periya.data@gmail.com>
Date: Fri, 14 Oct 2011 17:31:27 
To: <common-user@hadoop.apache.org>; <cdh-user@cloudera.org>
Reply-To: common-user@hadoop.apache.org
Subject: mapreduce linear chaining: ClassCastException

Hi all,
   I am trying a simple extension of WordCount example in Hadoop. I want to
get a frequency of wordcounts in descending order. To that I employ a linear
chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the
usual example). For the next MR job => I set the mapper to swap the <word,
count> to <count, word>. Then,  have the Identity reducer to simply store
the results.

My MR-1 does its job correctly and store the result in a temp path.

Question 1: The mapper of the second MR job (MR-2) doesn't like the input
format. I have properly set the input format for MapClass2 of what it
expects and what its output must be. It seems to expecting a LongWritable. I
suspect that it is trying to look at some index file. I am not sure.

It throws an error like this:

    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text

Some Info:
- I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it
for now.
- I use hadoop-0.20.2

For MR-1:
- conf1.setOutputKeyClass(Text.class);
- conf1.setOutputValueClass(IntWritable.class);

For MR-2
- takes in a Text (word) and IntWritable (sum)
- conf2.setOutputKeyClass(IntWritable.class);
- conf2.setOutputValueClass(Text.class);

public class MapClass2 extends MapReduceBase
      implements Mapper<Text, IntWritable, IntWritable, Text> {

      public void map(Text word, IntWritable sum,
              OutputCollector<IntWritable, Text> output,
              Reporter reporter) throws IOException {

      output.collect(sum, word);   // <sum, word>

Any suggestions would be helpful. Is my MapClass2 code right in the first
place...for swapping? Or should I assume that mapper reads line by line,
so,  must read in one line, then, use StrTokenizer to split them up and
convert the second token (sum) from str to Int....?? Or should I mess around
with OutputKeyComparator class?


View raw message