incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: Chukwa integration for Legolas Media real-time servers
Date Mon, 22 Feb 2010 17:31:02 GMT
Hi Oded,

If you are using the code from TRUNK, instruction here:

- Package your mapper and reducer classes, and put in a jar file.
- Upload parser jar file to hdfs://host:port/chukwa/demux
- Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type
reference to your class names in  Demux aliases section.

If you are using Chukwa 0.3.0, instruction here:

- Package your mapper and reducer classes into chukwa-core-0.3.0.jar
- Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type
reference to your class names in Demux aliases section.

Hope this helps.

Regards,
Eric

On 2/22/10 7:28 AM, "Oded Rosen" <oded@legolas-media.com> wrote:

> I have just sent this mail to Ari, but it is probably wise to share it will
> all of you:
> 
> Hello Ari,
> I'm Oded Rosen, with Legolas Media R&D team.
> We would like to use Chukwa to pass data from our real time servers into our
> hadoop cluster. The dataflow already reaches several GB/day, and we are about
> to extend this in the near future.
> Our main aim is to process raw data (in the form of
> fieldname1=value1<tab>fieldname2=value2....\n) into a format that fits
> straight into Hive, for a later processing.
> 
> We are already running a DirTailingAdaptor on our input directory, and recieve
> the the collected data in the chukwa/logs dir.
> Now, we would like to write our own Demux processor, in order to process the
> sink data, get only the fields we need from it, format the data and write it
> to the output directory, which will be defined as the input directory of a
> Hive table.
> 
> We have already written mapper/reducer classes that know how to extract the
> wanted fields from the raw data and apply the needed formats.
> We want to set a Demux processor with these classes as the map/reduce classes,
> but we could not find any documentation about how to do it.
> All we could do until now is to run the default demux that just copies the
> data into the output directory.
> We will appreciate any help you can offer us.


Mime
View raw message