hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Periya.Data" <periya.d...@gmail.com>
Subject mapreduce linear chaining: ClassCastException
Date Sat, 15 Oct 2011 00:31:27 GMT
Hi all,
   I am trying a simple extension of WordCount example in Hadoop. I want to
get a frequency of wordcounts in descending order. To that I employ a linear
chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the
usual example). For the next MR job => I set the mapper to swap the <word,
count> to <count, word>. Then,  have the Identity reducer to simply store
the results.

My MR-1 does its job correctly and store the result in a temp path.

Question 1: The mapper of the second MR job (MR-2) doesn't like the input
format. I have properly set the input format for MapClass2 of what it
expects and what its output must be. It seems to expecting a LongWritable. I
suspect that it is trying to look at some index file. I am not sure.

It throws an error like this:

    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text

Some Info:
- I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it
for now.
- I use hadoop-0.20.2

For MR-1:
- conf1.setOutputKeyClass(Text.class);
- conf1.setOutputValueClass(IntWritable.class);

For MR-2
- takes in a Text (word) and IntWritable (sum)
- conf2.setOutputKeyClass(IntWritable.class);
- conf2.setOutputValueClass(Text.class);

public class MapClass2 extends MapReduceBase
      implements Mapper<Text, IntWritable, IntWritable, Text> {

      public void map(Text word, IntWritable sum,
              OutputCollector<IntWritable, Text> output,
              Reporter reporter) throws IOException {

      output.collect(sum, word);   // <sum, word>

Any suggestions would be helpful. Is my MapClass2 code right in the first
place...for swapping? Or should I assume that mapper reads line by line,
so,  must read in one line, then, use StrTokenizer to split them up and
convert the second token (sum) from str to Int....?? Or should I mess around
with OutputKeyComparator class?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message