incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: Ho to deploying a custom processor to demux
Date Tue, 22 Dec 2009 22:40:37 GMT
Thanks for your quick reply Eric.

The TsProcessor does use buildGenericRecord and has been working fine for me
(at least I thought it was). I've mapped it to my dataType as you described
without problems. My only point with issue #1 was just that the
documentation is off and that the DefaultProcessor yields what I think is
unexpected behavior.

> There is an plan to load parser class from class path by using Java
annotation.
> It is still in the initial phase of planning.  Design participation are
welcome.

Yes, annotations would be useful. Or what about just having an extensions
directory (maybe lib/ext/) or something similar where custom jars could be
placed that are to be submitted by demux M/R? Do you know where the code
resides that handles adding the chukwa-core jar? I poked around bit but
couldn't find it.

Finally, is there a JIRA for this issue that you know of? If not I'll create
one. This is going to become a pain point for us soon, so if we have a
design I might be able to contribute a patch.

thanks,
Bill


On Tue, Dec 22, 2009 at 2:14 PM, Eric Yang <eyang@yahoo-inc.com> wrote:

> On 12/22/09 1:36 PM, "Bill Graham" <billgraham@gmail.com> wrote:
>
> > I've written my own Processor to handle my log format per this wiki and
> I've
> > run into a couple of gotchast:
> > http://wiki.apache.org/hadoop/DemuxModification
> >
> > 1. The default processor is not the TsProcessor as documented, but the
> > DefaultProcessor (see line 83 of Demux.java). This causes headaches
> because
> > when using DefaultProcessor  data always goes under minute "0" in hdfs,
> > regardless of when in the hour it was created.
> >
>
> There is a generic method to build the record, like:
>
> buildGenericRecord(record, recordEntry, timestamp, recordType);
>
> This method will build up key like:
>
> Time partition/Primary Key/timestamp
>
> When all records are roll up into large sequence file by end of the hour
> and
> end of the day, the sequence file is sorted by time partition and primary
> key.  This arrangement of data structure was put in place to assist data
> scanning.  When data is retrieved, use record.getTimestamp() to find the
> real timestamp for the record.
>
> TsProcessor is incompleted for now because the key in ChukwaRecord is used
> in hourly and daily roll up.  Without using buildGenericRecord, hourly and
> daily roll up will not work correctly.
>
> > 2. When implementing a custom parser as shown in the wiki, how do you
> register
> > the class so it gets included in the job that's submitted to the hadoop
> > cluster? The only way I've been able to do this is to put my class in the
> > package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and
> then
> > manually add that class to the chukwa-core-0.3.0.jar that  is on my data
> > processor, which is a pretty rough hack. Otherwise, I get class not found
> > exceptions in my mapper.
>
> The demux process is controlled by $CHUKWA_HOME/conf/chukwa-demux-conf.xml,
> and map the recordType to your parser class.  There is an plan to load
> parser class from class path by using Java annotation.  It is still in the
> initial phase of planning.  Design participation are welcome.  Hope this
> helps.  :)
>
> Regards,
> Eric
>
>

Mime
View raw message