incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: Writing another data process
Date Wed, 10 Mar 2010 04:28:11 GMT
Hi Oded,

For Chukwa 0.3, it does not support external class file.  For TRUNK, you can
create your own parser to run in dmux.  The parser class should extend
org.apache.hadoop.chukwa.extraction.demux.processor.AbstractProcessor for
mapper or implements
org.apache.hadoop.chukwa.extraction.demux.processor.ReduceProcessor for
reducer.  Edit CHUKWA_CONF/chukwa-demux-conf.xml, and reference the
RecordType to your class names.

After you have both class files and chukwa-demux-conf.xml file, put your jar
file in hdfs://namenode:port/chukwa/demux and the next demux job will pick
up the parser and run them automatically.  Duplication detection should be
handled by your mapper or reducer class, or a post demux step.  Chukwa does
not offer duplication detection currently.  Hope this helps.

Regards,
Eric



On 3/9/10 1:01 PM, "Oded Rosen" <oded@legolas-media.com> wrote:

> Hi,
> 
> I wonder if one can write an additional data process (in addition to the Demux
> + Archiving processes).
> The option of writing a plug-in demux class is available, but can I write
> another processes of my own to run in parallel do the demux+archiving, on the
> same data?
> What does it take?
> What classes should be inherited?
> How do I configure it (eg tell chukwa to apply it on every piece of data)?
> Do I have to deal with duplications myself?
> 
> Thanks a lot,


Mime
View raw message