incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <>
Subject Re: Chukwa integration for Legolas Media real-time servers
Date Tue, 23 Feb 2010 18:53:09 GMT
I have not studied hive in depth.  Jerome said he has done this,  perhaps he
could share his experience.


On 2/23/10 9:46 AM, "Oded Rosen" <> wrote:

> Thanks Eric,
> I have managed to write my own processor and to get the output as
> ChukwaRecords with our own customized fields in them.
> Now, I get to the part where I try to load this output into hive (or actually
> use the output dir, /repos, as the data directory of a Hive table).
> In this stage I need to let Hive recognize the ChukwaRecordKey + ChukwaRecord
> SerDes, so I need your help with that.
> I've seen that integration with Pig is pretty straighforward for Chukwa (using
> Chukwa-Pig.jar), but our idea is to automate the whole process straight into a
> table, and with Hive you can just define a directory as a hive table input. If
> we could get the data in a form that hive can regconize, we will not need
> another stage after the Demux.
> Can you think of a way to do this?
> Thanks,
> On Mon, Feb 22, 2010 at 7:31 PM, Eric Yang <> wrote:
>> Hi Oded,
>> If you are using the code from TRUNK, instruction here:
>> - Package your mapper and reducer classes, and put in a jar file.
>> - Upload parser jar file to hdfs://host:port/chukwa/demux
>> - Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type
>> reference to your class names in  Demux aliases section.
>> If you are using Chukwa 0.3.0, instruction here:
>> - Package your mapper and reducer classes into chukwa-core-0.3.0.jar
>> - Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type
>> reference to your class names in Demux aliases section.
>> Hope this helps.
>> Regards,
>> Eric
>> On 2/22/10 7:28 AM, "Oded Rosen" <> wrote:
>>> I have just sent this mail to Ari, but it is probably wise to share it will
>>> all of you:
>>> Hello Ari,
>>> I'm Oded Rosen, with Legolas Media R&D team.
>>> We would like to use Chukwa to pass data from our real time servers into our
>>> hadoop cluster. The dataflow already reaches several GB/day, and we are
>>> about
>>> to extend this in the near future.
>>> Our main aim is to process raw data (in the form of
>>> fieldname1=value1<tab>fieldname2=value2....\n) into a format that fits
>>> straight into Hive, for a later processing.
>>> We are already running a DirTailingAdaptor on our input directory, and
>>> recieve
>>> the the collected data in the chukwa/logs dir.
>>> Now, we would like to write our own Demux processor, in order to process the
>>> sink data, get only the fields we need from it, format the data and write it
>>> to the output directory, which will be defined as the input directory of a
>>> Hive table.
>>> We have already written mapper/reducer classes that know how to extract the
>>> wanted fields from the raw data and apply the needed formats.
>>> We want to set a Demux processor with these classes as the map/reduce
>>> classes,
>>> but we could not find any documentation about how to do it.
>>> All we could do until now is to run the default demux that just copies the
>>> data into the output directory.
>>> We will appreciate any help you can offer us.

View raw message