hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy <snickerdoodl...@gmail.com>
Subject extra documentation on how to write your own partitioner class
Date Fri, 30 Jan 2009 21:32:35 GMT
Hello,

Could someone point me toward some more documentation on how to write one's
own partition class? I have having quite a bit of trouble getting mine to
work. So far, it looks something like this:

public class myPartitioner extends MapReduceBase implements
Partitioner<IntWritable, IntWritable> {

    private int T;

    public void configure(JobConf job) {
    super.configure(job);
    String myT = job.get("tval");        //this is user defined
    T = Integer.parseInt(myT);
    }

    public int getPartition(IntWritable key, IntWritable value, int
numReduceTasks) {
        int newT = (T/numReduceTasks);
        int id = ((value.get()/ T);
        return (int)(id/newT);
    }
}

In the run() function of my M/R program I just set it using:

conf.setPartitionerClass(myPartitioner.class);

Is there anything else I need to set in the run() function?


The code compiles fine. When I run it, I know it is "using" the partitioner,
since I get different output than if I just let it use HashPartitioner.
However, it is not splitting between the reducers at all! If I set the
number of reducers to 2, all the output shows up in part-00000, while
part-00001 has nothing.

I am having trouble debugging this since I don't know how I can observe the
values of numReduceTasks (which I assume is being set by the system). Is
this a proper assumption?

If I try to insert any println() statements in the function, it isn't
outputted to either my terminal or my log files. Could someone give me some
general advice on how best to debug pieces of code like this?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message