Mailing-List: contact chukwa-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: chukwa-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of billgraham@gmail.com
 designates 209.85.160.50 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:reply-to:in-reply-to:references:date:message-id
         :subject:from:to:cc:content-type;
        b=eQdJmNSaL2Ysd+DCjixOBilK69GXE3x7il0fPxryaAjoFcUqVapuAHZRWGy9NGq2u6
         PgNRiDlK4iWJ3ylhhs6+eW56wbrPqSzH3MD0IDT9MctRneiSUniAXvQ4r/mijvQG3t1+
         3lJKxQIDMcT+58LyeBlS3ROS62y5oITQOc9Ac=
MIME-Version: 1.0
Reply-To: billgraham@gmail.com
In-Reply-To: <C75684D7.596C%eyang@yahoo-inc.com>
References: <449b48760912221336x49e1cbc9qf1215f860c1c3d36@mail.gmail.com>
	 <C75684D7.596C%eyang@yahoo-inc.com>
Date: Tue, 22 Dec 2009 14:40:37 -0800
Message-ID: <449b48760912221440u1233b49j94a4f4a8702b7467@mail.gmail.com>
Subject: Re: Ho to deploying a custom processor to demux
From: Bill Graham <billgraham@gmail.com>
To: Eric Yang <eyang@yahoo-inc.com>
Cc: chukwa-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00504502ae8e170e40047b58e5a6

--00504502ae8e170e40047b58e5a6
Content-Type: text/plain; charset=ISO-8859-1

Thanks for your quick reply Eric.

The TsProcessor does use buildGenericRecord and has been working fine for me
(at least I thought it was). I've mapped it to my dataType as you described
without problems. My only point with issue #1 was just that the
documentation is off and that the DefaultProcessor yields what I think is
unexpected behavior.

> There is an plan to load parser class from class path by using Java
annotation.
> It is still in the initial phase of planning.  Design participation are
welcome.

Yes, annotations would be useful. Or what about just having an extensions
directory (maybe lib/ext/) or something similar where custom jars could be
placed that are to be submitted by demux M/R? Do you know where the code
resides that handles adding the chukwa-core jar? I poked around bit but
couldn't find it.

Finally, is there a JIRA for this issue that you know of? If not I'll create
one. This is going to become a pain point for us soon, so if we have a
design I might be able to contribute a patch.

thanks,
Bill


On Tue, Dec 22, 2009 at 2:14 PM, Eric Yang <eyang@yahoo-inc.com> wrote:

> On 12/22/09 1:36 PM, "Bill Graham" <billgraham@gmail.com> wrote:
>
> > I've written my own Processor to handle my log format per this wiki and
> I've
> > run into a couple of gotchast:
> > http://wiki.apache.org/hadoop/DemuxModification
> >
> > 1. The default processor is not the TsProcessor as documented, but the
> > DefaultProcessor (see line 83 of Demux.java). This causes headaches
> because
> > when using DefaultProcessor  data always goes under minute "0" in hdfs,
> > regardless of when in the hour it was created.
> >
>
> There is a generic method to build the record, like:
>
> buildGenericRecord(record, recordEntry, timestamp, recordType);
>
> This method will build up key like:
>
> Time partition/Primary Key/timestamp
>
> When all records are roll up into large sequence file by end of the hour
> and
> end of the day, the sequence file is sorted by time partition and primary
> key.  This arrangement of data structure was put in place to assist data
> scanning.  When data is retrieved, use record.getTimestamp() to find the
> real timestamp for the record.
>
> TsProcessor is incompleted for now because the key in ChukwaRecord is used
> in hourly and daily roll up.  Without using buildGenericRecord, hourly and
> daily roll up will not work correctly.
>
> > 2. When implementing a custom parser as shown in the wiki, how do you
> register
> > the class so it gets included in the job that's submitted to the hadoop
> > cluster? The only way I've been able to do this is to put my class in the
> > package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and
> then
> > manually add that class to the chukwa-core-0.3.0.jar that  is on my data
> > processor, which is a pretty rough hack. Otherwise, I get class not found
> > exceptions in my mapper.
>
> The demux process is controlled by $CHUKWA_HOME/conf/chukwa-demux-conf.xml,
> and map the recordType to your parser class.  There is an plan to load
> parser class from class path by using Java annotation.  It is still in the
> initial phase of planning.  Design participation are welcome.  Hope this
> helps.  :)
>
> Regards,
> Eric
>
>

--00504502ae8e170e40047b58e5a6
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks for your quick reply Eric.<br><br>The TsProcessor does use buildGene=
ricRecord and has been working fine for me (at least I thought it was). I&#=
39;ve mapped it to my dataType as you described without problems. My only p=
oint with issue #1 was just that the documentation is off and that the Defa=
ultProcessor yields what I think is unexpected behavior.<br>
<br>&gt; There is an plan to load parser class from class path by using Jav=
a annotation.<br>&gt; It is still in the initial phase of planning. =A0Desi=
gn participation are welcome.<br><br>Yes, annotations would be useful. Or w=
hat about just having an extensions directory (maybe lib/ext/) or something=
 similar where custom jars could be placed that are to be submitted by demu=
x M/R? Do you know where the code resides that handles adding the chukwa-co=
re jar? I poked around bit but couldn&#39;t find it.<br>
<br>Finally, is there a JIRA for this issue that you know of? If not I&#39;=
ll create one. This is going to become a pain point for us soon, so if we h=
ave a design I might be able to contribute a patch.<br><br>thanks,<br>Bill<=
br>
<br><br><div class=3D"gmail_quote">On Tue, Dec 22, 2009 at 2:14 PM, Eric Ya=
ng <span dir=3D"ltr">&lt;<a href=3D"mailto:eyang@yahoo-inc.com">eyang@yahoo=
-inc.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; p=
adding-left: 1ex;">
<div class=3D"im">On 12/22/09 1:36 PM, &quot;Bill Graham&quot; &lt;<a href=
=3D"mailto:billgraham@gmail.com">billgraham@gmail.com</a>&gt; wrote:<br>
<br>
&gt; I&#39;ve written my own Processor to handle my log format per this wik=
i and I&#39;ve<br>
&gt; run into a couple of gotchast:<br>
&gt; <a href=3D"http://wiki.apache.org/hadoop/DemuxModification" target=3D"=
_blank">http://wiki.apache.org/hadoop/DemuxModification</a><br>
&gt;<br>
&gt; 1. The default processor is not the TsProcessor as documented, but the=
<br>
&gt; DefaultProcessor (see line 83 of Demux.java). This causes headaches be=
cause<br>
&gt; when using DefaultProcessor =A0data always goes under minute &quot;0&q=
uot; in hdfs,<br>
&gt; regardless of when in the hour it was created.<br>
&gt;<br>
<br>
</div>There is a generic method to build the record, like:<br>
<br>
buildGenericRecord(record, recordEntry, timestamp, recordType);<br>
<br>
This method will build up key like:<br>
<br>
Time partition/Primary Key/timestamp<br>
<br>
When all records are roll up into large sequence file by end of the hour an=
d<br>
end of the day, the sequence file is sorted by time partition and primary<b=
r>
key. =A0This arrangement of data structure was put in place to assist data<=
br>
scanning. =A0When data is retrieved, use record.getTimestamp() to find the<=
br>
real timestamp for the record.<br>
<br>
TsProcessor is incompleted for now because the key in ChukwaRecord is used<=
br>
in hourly and daily roll up. =A0Without using buildGenericRecord, hourly an=
d<br>
daily roll up will not work correctly.<br>
<div class=3D"im"><br>
&gt; 2. When implementing a custom parser as shown in the wiki, how do you =
register<br>
&gt; the class so it gets included in the job that&#39;s submitted to the h=
adoop<br>
&gt; cluster? The only way I&#39;ve been able to do this is to put my class=
 in the<br>
&gt; package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and=
 then<br>
&gt; manually add that class to the chukwa-core-0.3.0.jar that=A0 is on my =
data<br>
&gt; processor, which is a pretty rough hack. Otherwise, I get class not fo=
und<br>
&gt; exceptions in my mapper.<br>
<br>
</div>The demux process is controlled by $CHUKWA_HOME/conf/chukwa-demux-con=
f.xml,<br>
and map the recordType to your parser class. =A0There is an plan to load<br=
>
parser class from class path by using Java annotation. =A0It is still in th=
e<br>
initial phase of planning. =A0Design participation are welcome. =A0Hope thi=
s<br>
helps. =A0:)<br>
<br>
Regards,<br>
<font color=3D"#888888">Eric<br>
<br>
</font></blockquote></div><br>

--00504502ae8e170e40047b58e5a6--