chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <>
Subject Re: Directly create chukwa records?
Date Sat, 27 Feb 2010 20:51:44 GMT
There will be a converter from sequence file to other file format.  If a new
file format has been decided to replace sequence file.


On 2/26/10 8:04 PM, "James Seigel" <> wrote:

> It sounds like there is some exciting work  being done on the demux process.
> I was just wondering if you are planning to be backwards compatible with 0.3
> format for /repos as you move forward .
> Cheers
> james
> On 2010-02-26, at 10:38 AM, Eric Yang wrote:
>> On 2/26/10 4:43 AM, "Guillermo PĂ©rez" <> wrote:
>>> One related thing is that I want to modify the "cluster" where we put
>>> the files, because we will receive syslog data with several types of
>>> events that we want to store in different clusters to analyze, backup,
>>> archive separately. I have seen that you can modify the
>>> Record.tagsField and that we use a regexp for extracting the
>>> destination cluster. This is a bit akward, isn't? I don't want to keep
>>> a tagsField just for that. I'm using a field "event_type" and I have
>>> modified the extraction/engine/, so if that field
>>> exists, "event_" + <event_type> will be used as cluster. This is the
>>> proper way to go, or there is a better solution for this?.
>> I don't think you need to modify for this purpose.  The
>> backfill java program is taking first parameter as cluster.  Hence, you
>> could easily change event_type as the first parameter before you backfill.
>>> Another question is where I could start looking on how to build
>>> reports and aggregated results of the custom ChukwaRecords I'm
>>> inserting.
>> There is currently no formal solution to generate report from ChukwaRecords.
>> There is org.apache.hadoop.chukwa.dataloader.MetricDataLoader which loads
>> ChukwaRecords into mysql database base on mdl.xml file.  After data is
>> loaded, you could use to start the webserver, and visualize the data
>> in Chukwa SQL Client widget.  However, I must warn you that MetricDataLoader
>> is deprecated, and the future plan to generate report from ChukwaRecords is
>> as follow:
>> Having a post demux data loader which wait to receive new ChukwaRecords
>> files, and merge with the existing ChukwaRecords files through a second MR
>> job.  The second MR job also produces low resolution of the data for report.
>> /chukwa/repos/TYPE/DATE <-- Original data goes here.
>> /chukwa/report/TYPE/[yearly,monthly,weekly,daily] <-- Summarized JSON data
>> goes here.
>> The report JSON will be fixed to 300 data points per series, optimized for
>> graphing.  I am taking it slow on the actual implementation because
>> ChukwaRecords should be move to a faster seralization format.  It's another
>> area that needs to be improved for the future plan to work.
>> Regards,
>> Eric
> James Seigel
> Captain Hammer

View raw message