hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Latha <usla...@gmail.com>
Subject Re: How to modify hadoop-wordcount example to display File-wise results.
Date Sat, 18 Oct 2008 12:54:27 GMT
Hi All,

Thankyou for your valuable inputs in suggesting me the possible solutions of
creating an index file with following format.
word1 filename count
word2 filename count.

However, following is not working for me. Please help me to resolve the
same.

--------------------------
 public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {
          private Text word = new Text();
          private Text filename = new Text();
          public void map(LongWritable key, Text value,
OutputCollector<Text, Text > output, Reporter reporter) throws IOException {
          filename.set( ((FileSplit)
reporter.getInputSplit()).getPath().toString());
          String line = value.toString();
          StringTokenizer tokenizer = new StringTokenizer(line);
          while (tokenizer.hasMoreTokens()) {
               word.set(tokenizer.nextToken());
               output.collect(word, filename);
              }
          }
  }

  public static class Reduce extends MapReduceBase implements Reducer<Text,
Text , Text, Text> {
      public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text > output, Reporter reporter) throws IOException {
         int sum = 0;
         Text filename;
         while (values.hasNext()) {
             sum ++;
             filename.set(values.next().toString());
         }
       String file = filename.toString() + " " + ( new
IntWritable(sum)).toString();
       filename=new Text(file);
       output.collect(key, filename);
       }
  }

--------------------------
08/10/18 05:38:25 INFO mapred.JobClient: Task Id :
task_200810170342_0010_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in value from map: expected
org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427)
        at org.myorg.WordCount$Map.map(WordCount.java:23)
        at org.myorg.WordCount$Map.map(WordCount.java:13)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)


Thanks
Srilatha


On Mon, Oct 6, 2008 at 11:38 AM, Owen O'Malley <omalley@apache.org> wrote:

> On Sun, Oct 5, 2008 at 12:46 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > What you need to do is snag access to the filename in the configure
> method
> > of the mapper.
>
>
> You can also do it in the map method with:
>
> ((FileSplit) reporter.getInputSplit()).getPath()
>
>
> Then instead of outputting just the word as the key, output a pair
> > containing the word and the file name as the key.  Everything downstream
> > should remain the same.
>
>
> If you want to have each file handled by a single reduce, I'd suggest:
>
> class FileWordPair implements Writable {
>  private Text fileName;
>  private Text word;
>  ...
>  public int hashCode() {
>     return fileName.hashCode();
>  }
> }
>
> so that the HashPartitioner will send the records for file Foo to a single
> reducer. It would make sense to use this as an example for when to use
> grouping comparators (for getting a single call to reduce for each file)
> too...
>
> -- Owen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message