hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mithila Nagendra <mnage...@asu.edu>
Subject Custom partitioner for hadoop
Date Wed, 25 Aug 2010 16:40:25 GMT
I came across the tutorial on creating a custom partitioner on Hadoop (
http://philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/)
I
am trying to create my own partitioner on Hadoop, and the above blog
has given me a good starting point.

I had a question on the partitioner. In the code given in the blog they
have:

if( nbOccurences < 3 )
       return 0;
else
       return 1;

I want to do something similar, but I need the key to be in a range, like
following:

if(nbOccurences>lbrange0 &&  nbOccurences < ubrange0 )
       return 0;
if(nbOccurences>lbrange1 &&  nbOccurences < ubrange1 )
       return 1;

The range boundaries lbrange0, lbrange1, ubrange0, ubrange1 are calculated
by reading a histogram that is stored on the HDFS. I initially thought I can
read the histogram from the customPartitioner class and calculate the range
boundaries, but then in this case the ranges get recalculated for every
<K,V> pair emitted by the mapper. In order to avoid this I was thinking of
passing the range boundaries to the partitioner. How would I do that? Is
there an alternative? Any suggestion would prove useful.

Thank you,

Mithila
Ph.D. Candidate, C.S., Arizona State University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message