incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <>
Subject ChukwaRecordOutputFormat only works with ChukwaRecordPartitioner
Date Wed, 21 Jul 2010 06:48:12 GMT
I was looking at replacing the ChukwaRecordPartitioner with a HashbasedRecordParitioner. We
discussed this earlier here.... there is an issue in JIRA:

I patched chukwa to allow for a pluggable partitioner and configured chukwa to use the hash
based partitioner.  But it started failing to rename the _temporary files during the commit
phase after the reduce was finished because now there were multiple reducers trying to move
files to /chukwa/demuxProcessing/mrOutput with the same filename.   So I added a bit more
to the filename in ChukwaRecordOutputFormat

private String getParition(ChukwaRecordKey key, ChukwaRecord record) {
	return "part" + paritioner.getPartition(key, record, conf.getInt("mapred.reduce.tasks", 0));

protected String generateFileNameForKeyValue(ChukwaRecordKey key,
	ChukwaRecord record, String name) {
	String output = RecordUtil.getClusterName(record) + "/"
			+ key.getReduceType() + "/" + key.getReduceType() + "_" + getParition(key, record)
			+ Util.generateTimeOutput(record.getTime());

	return output;

So my filenames are now /chukwa/demuxProcessing/mrOutput/MyCluster/MyDataType/MyDataType_part0_20100720_0_35.R.evt

Just added the part to the filename and now when PostProcessorManager picks up that directory
it can mv each file into the correctly time bucket in /chukwa/repos (it increments a count
for each file in that directory.

Is there a better solution--I am not sure how general purpose my solution is.
View raw message