hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From james warren <ja...@rockyou.com>
Subject Re: extra documentation on how to write your own partitioner class
Date Wed, 04 Feb 2009 01:09:22 GMT
Hah - I should've read the code more closely.  Completely agree with Aaron's
assessment, and will drink more coffee before writing emails. :)

Thanks for the correction.

On Tue, Feb 3, 2009 at 4:42 PM, Aaron Kimball <aaron@cloudera.com> wrote:

> er?
>
> It seems to be using value.get(). That having been said, you should really
> partition based on key, not on value. (I am not sure why, exactly, the
> value
> is provided to the getPartition() method.)
>
>
> Moreover, I think the problem is that you are using division ( / ) not
> modulus ( % ).  Your code simplifies to:   (value.get() / T) / (T /
> numPartitions) = value.get() * numPartitions / T^2.
>
> The contract of getPartition() is that it returns a value in [0,
> numPartitions). The division operators are not guaranteed to return
> anything
> in this range, but (foo % numPartitions) will always do the right thing. So
> it's probably just  assigning everything to reduce partition 0.
> (Alternatively, it could be that value * numPartitions < T^2 for any values
> of T you're testing with, which means that integer division will return 0.)
>
> - Aaron
>
>
> On Fri, Jan 30, 2009 at 3:43 PM, Sandy <snickerdoodle08@gmail.com> wrote:
>
> > Hi James,
> >
> > Thank you very much! :-)
> >
> > -SM
> >
> > On Fri, Jan 30, 2009 at 4:17 PM, james warren <james@rockyou.com> wrote:
> >
> > > Hello Sandy -
> > > Your partitioner isn't using any information from the key/value pair -
> > it's
> > > only using the value T which is read once from the job configuration.
> > >  getPartition() will always return the same value, so all of your data
> is
> > > being sent to one reducer. :P
> > >
> > > cheers,
> > > -James
> > >
> > > On Fri, Jan 30, 2009 at 1:32 PM, Sandy <snickerdoodle08@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > Could someone point me toward some more documentation on how to write
> > > one's
> > > > own partition class? I have having quite a bit of trouble getting
> mine
> > to
> > > > work. So far, it looks something like this:
> > > >
> > > > public class myPartitioner extends MapReduceBase implements
> > > > Partitioner<IntWritable, IntWritable> {
> > > >
> > > >    private int T;
> > > >
> > > >    public void configure(JobConf job) {
> > > >    super.configure(job);
> > > >    String myT = job.get("tval");        //this is user defined
> > > >    T = Integer.parseInt(myT);
> > > >    }
> > > >
> > > >    public int getPartition(IntWritable key, IntWritable value, int
> > > > numReduceTasks) {
> > > >        int newT = (T/numReduceTasks);
> > > >        int id = ((value.get()/ T);
> > > >        return (int)(id/newT);
> > > >    }
> > > > }
> > > >
> > > > In the run() function of my M/R program I just set it using:
> > > >
> > > > conf.setPartitionerClass(myPartitioner.class);
> > > >
> > > > Is there anything else I need to set in the run() function?
> > > >
> > > >
> > > > The code compiles fine. When I run it, I know it is "using" the
> > > > partitioner,
> > > > since I get different output than if I just let it use
> HashPartitioner.
> > > > However, it is not splitting between the reducers at all! If I set
> the
> > > > number of reducers to 2, all the output shows up in part-00000, while
> > > > part-00001 has nothing.
> > > >
> > > > I am having trouble debugging this since I don't know how I can
> observe
> > > the
> > > > values of numReduceTasks (which I assume is being set by the system).
> > Is
> > > > this a proper assumption?
> > > >
> > > > If I try to insert any println() statements in the function, it isn't
> > > > outputted to either my terminal or my log files. Could someone give
> me
> > > some
> > > > general advice on how best to debug pieces of code like this?
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message