Return-Path: Delivered-To: apmail-hadoop-chukwa-user-archive@minotaur.apache.org Received: (qmail 69828 invoked from network); 21 Jul 2010 16:38:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Jul 2010 16:38:02 -0000 Received: (qmail 80792 invoked by uid 500); 21 Jul 2010 16:38:02 -0000 Delivered-To: apmail-hadoop-chukwa-user-archive@hadoop.apache.org Received: (qmail 80683 invoked by uid 500); 21 Jul 2010 16:38:01 -0000 Mailing-List: contact chukwa-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-user@hadoop.apache.org Delivered-To: mailing list chukwa-user@hadoop.apache.org Received: (qmail 80618 invoked by uid 99); 21 Jul 2010 16:38:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jul 2010 16:38:01 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.20] (HELO mrout1-b.corp.re1.yahoo.com) (69.147.107.20) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jul 2010 16:37:54 +0000 Received: from SNV-EXPF01.ds.corp.yahoo.com (snv-expf01.ds.corp.yahoo.com [207.126.227.250]) by mrout1-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o6LGZoTB069471 for ; Wed, 21 Jul 2010 09:35:50 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: return-path:x-originalarrivaltime; b=scQNcr0+XVj1ZngHNcTFpggYI5KkF0k3TvxUfGJGOnoLyseKQClW+vCLoNgdln7a Received: from SNV-EXVS06.ds.corp.yahoo.com ([207.126.227.234]) by SNV-EXPF01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 21 Jul 2010 09:35:50 -0700 Received: from 10.72.111.153 ([10.72.111.153]) by SNV-EXVS06.ds.corp.yahoo.com ([207.126.227.82]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.60]) with Microsoft Exchange Server HTTP-DAV ; Wed, 21 Jul 2010 16:35:50 +0000 User-Agent: Microsoft-Entourage/12.25.0.100505 Date: Wed, 21 Jul 2010 09:35:48 -0700 Subject: Re: ChukwaRecordOutputFormat only works with ChukwaRecordPartitioner From: Eric Yang To: Message-ID: Thread-Topic: ChukwaRecordOutputFormat only works with ChukwaRecordPartitioner Thread-Index: Acso8sYL+MRS+swfy0GkMQXXJJUsVg== In-Reply-To: <681A10D4-2345-4C9A-8445-E322CFE3E7E4@tynt.com> Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3362549749_19744162" X-OriginalArrivalTime: 21 Jul 2010 16:35:50.0213 (UTC) FILETIME=[C75CB750:01CB28F2] X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3362549749_19744162 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable I think this is in the right direction. Does this filename convention allows dfs =ADgetmerge to work on the directory? If it does, then I am fine with it. If it doesn=B9t, it may be good to label output file name as MyDataType_20100720_0_35.R_part0 to align with default output name of mapreduce. Regards, Eric On 7/20/10 11:48 PM, "Corbin Hoenes" wrote: > I was looking at replacing the ChukwaRecordPartitioner with a > HashbasedRecordParitioner. We discussed this earlier here.... there is an > issue in JIRA: https://issues.apache.org/jira/browse/CHUKWA-481 >=20 > I patched chukwa to allow for a pluggable partitioner and configured chuk= wa to > use the hash based partitioner. But it started failing to rename the > _temporary files during the commit phase after the reduce was finished be= cause > now there were multiple reducers trying to move files to > /chukwa/demuxProcessing/mrOutput with the same filename. So I added a b= it > more to the filename in ChukwaRecordOutputFormat >=20 > private String getParition(ChukwaRecordKey key, ChukwaRecord record) { > return "part" + paritioner.getPartition(key, record, > conf.getInt("mapred.reduce.tasks", 0)); > } >=20 > @Override > protected String generateFileNameForKeyValue(ChukwaRecordKey key, > ChukwaRecord record, String name) { >=20 > String output =3D RecordUtil.getClusterName(record) + "/" > + key.getReduceType() + "/" + key.getReduceType() + "_" + getParition(key= , > record) > + Util.generateTimeOutput(record.getTime()); >=20 > return output; > }=20 >=20 > So my filenames are now > /chukwa/demuxProcessing/mrOutput/MyCluster/MyDataType/MyDataType_part0_20= 10072 > 0_0_35.R.evt >=20 > Just added the part to the filename and now when PostProcessorManager pic= ks up > that directory it can mv each file into the correctly time bucket in > /chukwa/repos (it increments a count for each file in that directory. >=20 > Is there a better solution--I am not sure how general purpose my solution= is. >=20 --B_3362549749_19744162 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable Re: ChukwaRecordOutputFormat only works with ChukwaRecordPartitioner= I think this is in the right direction.  Does this filename conventio= n allows dfs –getmerge to work on the directory?  If it does, the= n I am fine with it.  If it doesn’t, it may be good to label outp= ut file name  as MyDataType_20100720_0_35.R_part0 to align with default= output name of mapreduce.

Regards,
Eric

On 7/20/10 11:48 PM, "Corbin Hoenes" <corbin@tynt.com> wrote:

<= SPAN STYLE=3D'font-size:11pt'>I was looking at replacing the ChukwaRecordParti= tioner with a HashbasedRecordParitioner. We discussed this earlier here.... = there is an issue in JIRA: https://issues.apache.org/jira/browse/CHUKWA-481

I patched chukwa to allow for a pluggable partitioner and configured chukwa= to use the hash based partitioner.  But it started failing to rename t= he _temporary files during the commit phase after the reduce was finished be= cause now there were multiple reducers trying to move files to /chukwa/demux= Processing/mrOutput with the same filename.   So I added a bit mor= e to the filename in ChukwaRecordOutputFormat

private String getParition(ChukwaRecordKey key= , ChukwaRecord record) {
return "part"<= /FONT> + paritioner.getPartition(key, record, <= FONT COLOR=3D"#102BC3">conf.getInt("mapred.r= educe.tasks", 0));
}

@Override
protected String generateFileNameForKeyValue(C= hukwaRecordKey key,
ChukwaRecord record, String name) {

String output =3D RecordUtil.getClusterName(record) + &= quot;/"
+ key.getReduceType() + "/" += key.getReduceType() + "_" + getParit= ion(key, record)
+ Util.generateTimeOutput(record.getTime());

return output;
}

So my filenames are now /chukwa/demuxProcessing/mrOutput/MyCluster/MyDataTy= pe/MyDataType_part0_20100720_0_35.R.evt

Just added the part to the filename and now when PostProcessorManager picks= up that directory it can mv each file into the correctly time bucket in /ch= ukwa/repos (it increments a count for each file in that directory.

Is there a better solution--I am not sure how general purpose my solution i= s.

--B_3362549749_19744162--