Return-Path: Delivered-To: apmail-hadoop-chukwa-user-archive@minotaur.apache.org Received: (qmail 46508 invoked from network); 22 Dec 2009 22:41:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Dec 2009 22:41:05 -0000 Received: (qmail 86070 invoked by uid 500); 22 Dec 2009 22:41:05 -0000 Delivered-To: apmail-hadoop-chukwa-user-archive@hadoop.apache.org Received: (qmail 86055 invoked by uid 500); 22 Dec 2009 22:41:05 -0000 Mailing-List: contact chukwa-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-user@hadoop.apache.org Delivered-To: mailing list chukwa-user@hadoop.apache.org Received: (qmail 86046 invoked by uid 99); 22 Dec 2009 22:41:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2009 22:41:04 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of billgraham@gmail.com designates 209.85.160.50 as permitted sender) Received: from [209.85.160.50] (HELO mail-pw0-f50.google.com) (209.85.160.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2009 22:40:58 +0000 Received: by pwi20 with SMTP id 20so4419047pwi.29 for ; Tue, 22 Dec 2009 14:40:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=6MatQgKPp4W7M06zZJaf0QcTyeMfgHzdRhmoGcgjxek=; b=LEKrJNoMTb385QtY3YyuLN7qcxPNejnMqcfTeOy/Y+vlKbInwsbawMvYFKdp3DcrNv +9vAR7RlDZYR1/ASWCNhyZWJNd+9o7uLGbvaji2xOCCTNquWXNMpIYC/iv8wz9SmFjph gpXNu82kE5yB/KMeLAutqzQVbZHr7a65JSEyc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=eQdJmNSaL2Ysd+DCjixOBilK69GXE3x7il0fPxryaAjoFcUqVapuAHZRWGy9NGq2u6 PgNRiDlK4iWJ3ylhhs6+eW56wbrPqSzH3MD0IDT9MctRneiSUniAXvQ4r/mijvQG3t1+ 3lJKxQIDMcT+58LyeBlS3ROS62y5oITQOc9Ac= MIME-Version: 1.0 Received: by 10.142.7.38 with SMTP id 38mr6132985wfg.339.1261521637674; Tue, 22 Dec 2009 14:40:37 -0800 (PST) Reply-To: billgraham@gmail.com In-Reply-To: References: <449b48760912221336x49e1cbc9qf1215f860c1c3d36@mail.gmail.com> Date: Tue, 22 Dec 2009 14:40:37 -0800 Message-ID: <449b48760912221440u1233b49j94a4f4a8702b7467@mail.gmail.com> Subject: Re: Ho to deploying a custom processor to demux From: Bill Graham To: Eric Yang Cc: chukwa-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00504502ae8e170e40047b58e5a6 --00504502ae8e170e40047b58e5a6 Content-Type: text/plain; charset=ISO-8859-1 Thanks for your quick reply Eric. The TsProcessor does use buildGenericRecord and has been working fine for me (at least I thought it was). I've mapped it to my dataType as you described without problems. My only point with issue #1 was just that the documentation is off and that the DefaultProcessor yields what I think is unexpected behavior. > There is an plan to load parser class from class path by using Java annotation. > It is still in the initial phase of planning. Design participation are welcome. Yes, annotations would be useful. Or what about just having an extensions directory (maybe lib/ext/) or something similar where custom jars could be placed that are to be submitted by demux M/R? Do you know where the code resides that handles adding the chukwa-core jar? I poked around bit but couldn't find it. Finally, is there a JIRA for this issue that you know of? If not I'll create one. This is going to become a pain point for us soon, so if we have a design I might be able to contribute a patch. thanks, Bill On Tue, Dec 22, 2009 at 2:14 PM, Eric Yang wrote: > On 12/22/09 1:36 PM, "Bill Graham" wrote: > > > I've written my own Processor to handle my log format per this wiki and > I've > > run into a couple of gotchast: > > http://wiki.apache.org/hadoop/DemuxModification > > > > 1. The default processor is not the TsProcessor as documented, but the > > DefaultProcessor (see line 83 of Demux.java). This causes headaches > because > > when using DefaultProcessor data always goes under minute "0" in hdfs, > > regardless of when in the hour it was created. > > > > There is a generic method to build the record, like: > > buildGenericRecord(record, recordEntry, timestamp, recordType); > > This method will build up key like: > > Time partition/Primary Key/timestamp > > When all records are roll up into large sequence file by end of the hour > and > end of the day, the sequence file is sorted by time partition and primary > key. This arrangement of data structure was put in place to assist data > scanning. When data is retrieved, use record.getTimestamp() to find the > real timestamp for the record. > > TsProcessor is incompleted for now because the key in ChukwaRecord is used > in hourly and daily roll up. Without using buildGenericRecord, hourly and > daily roll up will not work correctly. > > > 2. When implementing a custom parser as shown in the wiki, how do you > register > > the class so it gets included in the job that's submitted to the hadoop > > cluster? The only way I've been able to do this is to put my class in the > > package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and > then > > manually add that class to the chukwa-core-0.3.0.jar that is on my data > > processor, which is a pretty rough hack. Otherwise, I get class not found > > exceptions in my mapper. > > The demux process is controlled by $CHUKWA_HOME/conf/chukwa-demux-conf.xml, > and map the recordType to your parser class. There is an plan to load > parser class from class path by using Java annotation. It is still in the > initial phase of planning. Design participation are welcome. Hope this > helps. :) > > Regards, > Eric > > --00504502ae8e170e40047b58e5a6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks for your quick reply Eric.

The TsProcessor does use buildGene= ricRecord and has been working fine for me (at least I thought it was). I&#= 39;ve mapped it to my dataType as you described without problems. My only p= oint with issue #1 was just that the documentation is off and that the Defa= ultProcessor yields what I think is unexpected behavior.

> There is an plan to load parser class from class path by using Jav= a annotation.
> It is still in the initial phase of planning. =A0Desi= gn participation are welcome.

Yes, annotations would be useful. Or w= hat about just having an extensions directory (maybe lib/ext/) or something= similar where custom jars could be placed that are to be submitted by demu= x M/R? Do you know where the code resides that handles adding the chukwa-co= re jar? I poked around bit but couldn't find it.

Finally, is there a JIRA for this issue that you know of? If not I'= ll create one. This is going to become a pain point for us soon, so if we h= ave a design I might be able to contribute a patch.

thanks,
Bill<= br>

On Tue, Dec 22, 2009 at 2:14 PM, Eric Ya= ng <eyang@yahoo= -inc.com> wrote:
On 12/22/09 1:36 PM, "Bill Graham" <billgraham@gmail.com> wrote:

> I've written my own Processor to handle my log format per this wik= i and I've
> run into a couple of gotchast:
> http://wiki.apache.org/hadoop/DemuxModification
>
> 1. The default processor is not the TsProcessor as documented, but the=
> DefaultProcessor (see line 83 of Demux.java). This causes headaches be= cause
> when using DefaultProcessor =A0data always goes under minute "0&q= uot; in hdfs,
> regardless of when in the hour it was created.
>

There is a generic method to build the record, like:

buildGenericRecord(record, recordEntry, timestamp, recordType);

This method will build up key like:

Time partition/Primary Key/timestamp

When all records are roll up into large sequence file by end of the hour an= d
end of the day, the sequence file is sorted by time partition and primary key. =A0This arrangement of data structure was put in place to assist data<= br> scanning. =A0When data is retrieved, use record.getTimestamp() to find the<= br> real timestamp for the record.

TsProcessor is incompleted for now because the key in ChukwaRecord is used<= br> in hourly and daily roll up. =A0Without using buildGenericRecord, hourly an= d
daily roll up will not work correctly.

> 2. When implementing a custom parser as shown in the wiki, how do you = register
> the class so it gets included in the job that's submitted to the h= adoop
> cluster? The only way I've been able to do this is to put my class= in the
> package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and= then
> manually add that class to the chukwa-core-0.3.0.jar that=A0 is on my = data
> processor, which is a pretty rough hack. Otherwise, I get class not fo= und
> exceptions in my mapper.

The demux process is controlled by $CHUKWA_HOME/conf/chukwa-demux-con= f.xml,
and map the recordType to your parser class. =A0There is an plan to load parser class from class path by using Java annotation. =A0It is still in th= e
initial phase of planning. =A0Design participation are welcome. =A0Hope thi= s
helps. =A0:)

Regards,
Eric


--00504502ae8e170e40047b58e5a6--