incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <>
Subject Re: Using PIG for processing Chuckwa files
Date Thu, 04 Feb 2010 18:31:34 GMT

Yes there is Pig support.  I am just learning how to use it but with some help from people
on this list have been successful in using Pig to analyze chukwa collected logs.

In  ${CHUKWA_HOME}/contrib/chukwa-pig/ you'll have a chukwa-pig.jar which contains the ChukwaStorage

Once you have that you can use it like this:

register /[your chukwa path here]/chukwa-core-0.3.0.jar
register /[your udf path here]/lib/chukwa-pig.jar

records = LOAD '$in_file' using org.apache.hadoop.chukwa.ChukwaStorage() as (ts:long, fields);
named_records = FOREACH records GENERATE fields#'URI' as uri,fields#'RECORD_TYPE' as type,fields#'CLIENT_IP_ADDRESS'
as ip; 
dump named_records;

Chukwa files are sequence file format that uses a "ChukwaRecord" which are key,value pairs.
 You can organize your data in the ChukwaRecords in a custom format if needed by using a Custom
Processor for your data type.  Example above shows a bunch of custom fields like URI that
were parse out of the log files via a processor.  This can make it a bit easier for your pig
scripts to get data out.

On Feb 4, 2010, at 7:24 AM, Vincent Barat wrote:

> Hello,
> I'm currently evaluating Chuckwa and I wonder if there is a way to use PIG to map/reduce
the files produced by Chuckwa?
> If yes, is there a special PIG loader to use?
> What is the format of Chuckwa files? Is it just a concatenation of all logs sent by the
> Thanks for your help.
> <vincent_barat.vcf>

View raw message