hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy <snickerdoodl...@gmail.com>
Subject Re: extra documentation on how to write your own partitioner class
Date Fri, 30 Jan 2009 23:43:21 GMT
Hi James,

Thank you very much! :-)

-SM

On Fri, Jan 30, 2009 at 4:17 PM, james warren <james@rockyou.com> wrote:

> Hello Sandy -
> Your partitioner isn't using any information from the key/value pair - it's
> only using the value T which is read once from the job configuration.
>  getPartition() will always return the same value, so all of your data is
> being sent to one reducer. :P
>
> cheers,
> -James
>
> On Fri, Jan 30, 2009 at 1:32 PM, Sandy <snickerdoodle08@gmail.com> wrote:
>
> > Hello,
> >
> > Could someone point me toward some more documentation on how to write
> one's
> > own partition class? I have having quite a bit of trouble getting mine to
> > work. So far, it looks something like this:
> >
> > public class myPartitioner extends MapReduceBase implements
> > Partitioner<IntWritable, IntWritable> {
> >
> >    private int T;
> >
> >    public void configure(JobConf job) {
> >    super.configure(job);
> >    String myT = job.get("tval");        //this is user defined
> >    T = Integer.parseInt(myT);
> >    }
> >
> >    public int getPartition(IntWritable key, IntWritable value, int
> > numReduceTasks) {
> >        int newT = (T/numReduceTasks);
> >        int id = ((value.get()/ T);
> >        return (int)(id/newT);
> >    }
> > }
> >
> > In the run() function of my M/R program I just set it using:
> >
> > conf.setPartitionerClass(myPartitioner.class);
> >
> > Is there anything else I need to set in the run() function?
> >
> >
> > The code compiles fine. When I run it, I know it is "using" the
> > partitioner,
> > since I get different output than if I just let it use HashPartitioner.
> > However, it is not splitting between the reducers at all! If I set the
> > number of reducers to 2, all the output shows up in part-00000, while
> > part-00001 has nothing.
> >
> > I am having trouble debugging this since I don't know how I can observe
> the
> > values of numReduceTasks (which I assume is being set by the system). Is
> > this a proper assumption?
> >
> > If I try to insert any println() statements in the function, it isn't
> > outputted to either my terminal or my log files. Could someone give me
> some
> > general advice on how best to debug pieces of code like this?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message