chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oded Rosen <>
Subject Re: Chukwa integration for Legolas Media real-time servers
Date Tue, 23 Feb 2010 17:46:59 GMT
Thanks Eric,
I have managed to write my own processor and to get the output as
ChukwaRecords with our own customized fields in them.
Now, I get to the part where I try to load this output into hive (or
actually use the output dir, /repos, as the data directory of a Hive table).
In this stage I need to let Hive recognize the ChukwaRecordKey +
ChukwaRecord SerDes, so I need your help with that.

I've seen that integration with Pig is pretty straighforward for Chukwa
(using Chukwa-Pig.jar), but our idea is to automate the whole process
straight into a table, and with Hive you can just define a directory as a
hive table input. If we could get the data in a form that hive can
regconize, we will not need another stage after the Demux.

Can you think of a way to do this?


On Mon, Feb 22, 2010 at 7:31 PM, Eric Yang <> wrote:

> Hi Oded,
> If you are using the code from TRUNK, instruction here:
> - Package your mapper and reducer classes, and put in a jar file.
> - Upload parser jar file to hdfs://host:port/chukwa/demux
> - Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type
> reference to your class names in  Demux aliases section.
> If you are using Chukwa 0.3.0, instruction here:
> - Package your mapper and reducer classes into chukwa-core-0.3.0.jar
> - Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type
> reference to your class names in Demux aliases section.
> Hope this helps.
> Regards,
> Eric
> On 2/22/10 7:28 AM, "Oded Rosen" <> wrote:
> > I have just sent this mail to Ari, but it is probably wise to share it
> will
> > all of you:
> >
> > Hello Ari,
> > I'm Oded Rosen, with Legolas Media R&D team.
> > We would like to use Chukwa to pass data from our real time servers into
> our
> > hadoop cluster. The dataflow already reaches several GB/day, and we are
> about
> > to extend this in the near future.
> > Our main aim is to process raw data (in the form of
> > fieldname1=value1<tab>fieldname2=value2....\n) into a format that fits
> > straight into Hive, for a later processing.
> >
> > We are already running a DirTailingAdaptor on our input directory, and
> recieve
> > the the collected data in the chukwa/logs dir.
> > Now, we would like to write our own Demux processor, in order to process
> the
> > sink data, get only the fields we need from it, format the data and write
> it
> > to the output directory, which will be defined as the input directory of
> a
> > Hive table.
> >
> > We have already written mapper/reducer classes that know how to extract
> the
> > wanted fields from the raw data and apply the needed formats.
> > We want to set a Demux processor with these classes as the map/reduce
> classes,
> > but we could not find any documentation about how to do it.
> > All we could do until now is to run the default demux that just copies
> the
> > data into the output directory.
> > We will appreciate any help you can offer us.


View raw message