incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Seigel" <ja...@tynt.com>
Subject Re: Directly create chukwa records?
Date Sat, 27 Feb 2010 21:28:07 GMT
Awesome!

Sent from my mobile. Please excuse the typos.

On 2010-02-27, at 1:52 PM, "Eric Yang" <eyang@yahoo-inc.com> wrote:

> There will be a converter from sequence file to other file format.   
> If a new
> file format has been decided to replace sequence file.
>
> Regards,
> Eric
>
>
> On 2/26/10 8:04 PM, "James Seigel" <james@tynt.com> wrote:
>
>> It sounds like there is some exciting work  being done on the demux  
>> process.
>> I was just wondering if you are planning to be backwards compatible  
>> with 0.3
>> format for /repos as you move forward .
>>
>> Cheers
>> james
>>
>>
>> On 2010-02-26, at 10:38 AM, Eric Yang wrote:
>>
>>>
>>>
>>>
>>> On 2/26/10 4:43 AM, "Guillermo PĂ©rez" <bisho@tuenti.com> wrote:
>>>
>>>> One related thing is that I want to modify the "cluster" where we  
>>>> put
>>>> the files, because we will receive syslog data with several types  
>>>> of
>>>> events that we want to store in different clusters to analyze,  
>>>> backup,
>>>> archive separately. I have seen that you can modify the
>>>> Record.tagsField and that we use a regexp for extracting the
>>>> destination cluster. This is a bit akward, isn't? I don't want to  
>>>> keep
>>>> a tagsField just for that. I'm using a field "event_type" and I  
>>>> have
>>>> modified the extraction/engine/RecordUtil.java, so if that field
>>>> exists, "event_" + <event_type> will be used as cluster. This is  
>>>> the
>>>> proper way to go, or there is a better solution for this?.
>>>
>>> I don't think you need to modify RecordUtil.java for this  
>>> purpose.  The
>>> backfill java program is taking first parameter as cluster.   
>>> Hence, you
>>> could easily change event_type as the first parameter before you  
>>> backfill.
>>>
>>>> Another question is where I could start looking on how to build
>>>> reports and aggregated results of the custom ChukwaRecords I'm
>>>> inserting.
>>>
>>> There is currently no formal solution to generate report from  
>>> ChukwaRecords.
>>> There is org.apache.hadoop.chukwa.dataloader.MetricDataLoader  
>>> which loads
>>> ChukwaRecords into mysql database base on mdl.xml file.  After  
>>> data is
>>> loaded, you could use hicc.sh to start the webserver, and  
>>> visualize the data
>>> in Chukwa SQL Client widget.  However, I must warn you that  
>>> MetricDataLoader
>>> is deprecated, and the future plan to generate report from  
>>> ChukwaRecords is
>>> as follow:
>>>
>>> Having a post demux data loader which wait to receive new  
>>> ChukwaRecords
>>> files, and merge with the existing ChukwaRecords files through a  
>>> second MR
>>> job.  The second MR job also produces low resolution of the data  
>>> for report.
>>>
>>> /chukwa/repos/TYPE/DATE <-- Original data goes here.
>>> /chukwa/report/TYPE/[yearly,monthly,weekly,daily] <-- Summarized  
>>> JSON data
>>> goes here.
>>>
>>> The report JSON will be fixed to 300 data points per series,  
>>> optimized for
>>> graphing.  I am taking it slow on the actual implementation because
>>> ChukwaRecords should be move to a faster seralization format.   
>>> It's another
>>> area that needs to be improved for the future plan to work.
>>>
>>> Regards,
>>> Eric
>>>
>>
>> James Seigel
>> james@tynt.com
>> http://www.tynt.com
>> Captain Hammer
>>
>>
>
Mime
View raw message