hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taylor, Ronald C" <ronald.tay...@pnl.gov>
Subject a question on WordCount program failure
Date Mon, 15 Feb 2010 08:27:12 GMT

Hello,

I just joined the list and got a newbie question. Operating on a 10-node Linux cluster running
Hadoop 0.20.1, I've been trying out the WordCount program. 

I have three files: WordCount.java, WordCountMapper.java, and WordCountReducer.java. The contents
of those three files are listed in full at bottom. 

Compilation, jarring and invocation appear to work fine, when done as follows:

javac WordCountMapper.java
javac WordCountReducer.java
javac WordCount.java

jar cf jarredWordCount.jar WordCountMapper.class WordCountReducer.class WordCount.class

Invocation:
hadoop jar jarredWordCount.jar WordCount "/user/rtaylor/WordCountInputDirectory" "/user/rtaylor/OutputDirectory"

%%%

However, the results are not what I expect. Here is partial listing from one of the output
files:

artillery	1
barged	1
call	1
coalition	1
coalition	1
demonstrated	1
get	1
has	1
has	1

I was expecting, for example, to get one line for "coalition",  like so:

coalition 2

Instead I get the two (non-summed) lines that you see above. 

I've tried several changes, with no effect. I still get the same (wrong) output with no word
summation. This is trying me nuts, especially since I presume that I am making a simple mistake
that somebody should be able to be spot easily. So - please help!

   - Ron Taylor
___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.taylor@pnl.gov

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

contents of WordCount.java:

import java.io.*;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;

public class WordCount {

    public static void main(String[] args)  
        throws java.io.IOException, 
               java.lang.InterruptedException,
   	       java.lang.ClassNotFoundException {

    org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();

    String[] otherArgs = new org.apache.hadoop.util.GenericOptionsParser(conf, args).getRemainingArgs();
       if (otherArgs.length != 2) {
             System.err.println("Error in parameter inputs - Usage: WordCount <in> <out>");
             System.exit(2);
       }
    String inputDirectory   = otherArgs[0];
    String outputDirectory  = otherArgs[1];

    Job job = new Job(conf, "WordCount");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(WordCountMapper.class);
    job.setCombinerClass(WordCountReducer.class);
    job.setReducerClass(WordCountReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(inputDirectory));
    FileOutputFormat.setOutputPath(job, new Path(outputDirectory));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }

}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

contents of WordCountMapper.java:

import java.io.*;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;

public class WordCountMapper extends org.apache.hadoop.mapreduce.Mapper <LongWritable,
Text, Text, IntWritable> {
  private final IntWritable one = new IntWritable(1);
  private Text word = new Text();

    public void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context
context)
  	                throws IOException, java.lang.InterruptedException {
    String line = value.toString();
    StringTokenizer itr = new StringTokenizer(line.toLowerCase());
    while(itr.hasMoreTokens()) {
      word.set(itr.nextToken());
      context.write(word, one);
    }
  }
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

contents of WordCountReducer.java:

import java.io.*;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
    private IntWritable result = new IntWritable(); 
 
  public void reduce(Text key, Iterable<IntWritable> values,
                     org.apache.hadoop.mapreduce.Mapper.Context context) 
  	                throws IOException, java.lang.InterruptedException {
    int sum = 0;
    for (IntWritable val : values) {
      int value = val.get();
      sum += value;
    }
    result.set(sum);
    context.write(key, result);
  }
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Mime
View raw message