incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Seigel <>
Subject Re: Directly create chukwa records?
Date Sat, 27 Feb 2010 04:04:59 GMT
It sounds like there is some exciting work  being done on the demux process.  I was just wondering
if you are planning to be backwards compatible with 0.3 format for /repos as you move forward


On 2010-02-26, at 10:38 AM, Eric Yang wrote:

> On 2/26/10 4:43 AM, "Guillermo PĂ©rez" <> wrote:
>> One related thing is that I want to modify the "cluster" where we put
>> the files, because we will receive syslog data with several types of
>> events that we want to store in different clusters to analyze, backup,
>> archive separately. I have seen that you can modify the
>> Record.tagsField and that we use a regexp for extracting the
>> destination cluster. This is a bit akward, isn't? I don't want to keep
>> a tagsField just for that. I'm using a field "event_type" and I have
>> modified the extraction/engine/, so if that field
>> exists, "event_" + <event_type> will be used as cluster. This is the
>> proper way to go, or there is a better solution for this?.
> I don't think you need to modify for this purpose.  The
> backfill java program is taking first parameter as cluster.  Hence, you
> could easily change event_type as the first parameter before you backfill.
>> Another question is where I could start looking on how to build
>> reports and aggregated results of the custom ChukwaRecords I'm
>> inserting.
> There is currently no formal solution to generate report from ChukwaRecords.
> There is org.apache.hadoop.chukwa.dataloader.MetricDataLoader which loads
> ChukwaRecords into mysql database base on mdl.xml file.  After data is
> loaded, you could use to start the webserver, and visualize the data
> in Chukwa SQL Client widget.  However, I must warn you that MetricDataLoader
> is deprecated, and the future plan to generate report from ChukwaRecords is
> as follow:
> Having a post demux data loader which wait to receive new ChukwaRecords
> files, and merge with the existing ChukwaRecords files through a second MR
> job.  The second MR job also produces low resolution of the data for report.
> /chukwa/repos/TYPE/DATE <-- Original data goes here.
> /chukwa/report/TYPE/[yearly,monthly,weekly,daily] <-- Summarized JSON data
> goes here.
> The report JSON will be fixed to 300 data points per series, optimized for
> graphing.  I am taking it slow on the actual implementation because
> ChukwaRecords should be move to a faster seralization format.  It's another
> area that needs to be improved for the future plan to work.
> Regards,
> Eric

James Seigel
Captain Hammer

View raw message