hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Rosenstrauch <dar...@darose.net>
Subject Re: Custom partitioner for hadoop
Date Wed, 25 Aug 2010 19:50:00 GMT
On 08/25/2010 12:40 PM, Mithila Nagendra wrote:
> In order to avoid this I was thinking of
> passing the range boundaries to the partitioner. How would I do that? Is
> there an alternative? Any suggestion would prove useful.

We use a custom partitioner, for which we pass in configuration data 
that gets used in the partitioning calculations.

We do it by making the Partitioner implement Configurable, and then grab 
the needed config data from the configuration object that we're given. 
(We set the needed config data on the config object when we submit the 
job).  i.e., like so:

import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configuration;

public class OurPartitioner extends Partitioner<BytesWritable, Writable> 
implements Configurable {
...

	public int getPartition(BytesWritable key, Writable value, int 
numPartitions) {
...
	}

	public Configuration getConf() {
		return conf;
	}

	public void setConf(Configuration conf) {
		this.conf = conf;

		configure();
	}

	@SuppressWarnings("unchecked")
	private void configure() throws IOException {
		String <parmValue> = conf.get(<parmKey>);
		if (<parmValue> == null) {
			throw new RuntimeException(.....);
		}
	}

	private Configuration conf;
}

HTH,

DR

Mime
View raw message