chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: ChukwaRecordOutputFormat only works with ChukwaRecordPartitioner
Date Wed, 21 Jul 2010 16:35:48 GMT
I think this is in the right direction.  Does this filename convention
allows dfs ­getmerge to work on the directory?  If it does, then I am fine
with it.  If it doesn¹t, it may be good to label output file name  as
MyDataType_20100720_0_35.R_part0 to align with default output name of
mapreduce.

Regards,
Eric

On 7/20/10 11:48 PM, "Corbin Hoenes" <corbin@tynt.com> wrote:

> I was looking at replacing the ChukwaRecordPartitioner with a
> HashbasedRecordParitioner. We discussed this earlier here.... there is an
> issue in JIRA: https://issues.apache.org/jira/browse/CHUKWA-481
> 
> I patched chukwa to allow for a pluggable partitioner and configured chukwa to
> use the hash based partitioner.  But it started failing to rename the
> _temporary files during the commit phase after the reduce was finished because
> now there were multiple reducers trying to move files to
> /chukwa/demuxProcessing/mrOutput with the same filename.   So I added a bit
> more to the filename in ChukwaRecordOutputFormat
> 
> private String getParition(ChukwaRecordKey key, ChukwaRecord record) {
> return "part" + paritioner.getPartition(key, record,
> conf.getInt("mapred.reduce.tasks", 0));
> }
> 
> @Override
> protected String generateFileNameForKeyValue(ChukwaRecordKey key,
> ChukwaRecord record, String name) {
> 
> String output = RecordUtil.getClusterName(record) + "/"
> + key.getReduceType() + "/" + key.getReduceType() + "_" + getParition(key,
> record)
> + Util.generateTimeOutput(record.getTime());
> 
> return output;
> } 
> 
> So my filenames are now
> /chukwa/demuxProcessing/mrOutput/MyCluster/MyDataType/MyDataType_part0_2010072
> 0_0_35.R.evt
> 
> Just added the part to the filename and now when PostProcessorManager picks up
> that directory it can mv each file into the correctly time bucket in
> /chukwa/repos (it increments a count for each file in that directory.
> 
> Is there a better solution--I am not sure how general purpose my solution is.
> 


Mime
View raw message