incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <Stuti_Awas...@persistent.co.in>
Subject RE: Problem in Output file format in FinalArchives
Date Tue, 18 May 2010 05:20:13 GMT
Thanks Jerome,

Actually my use case is like this :

Step 1 : I give my logs to chukwa and it process the logs to generate the sequence files.

Step 2 : Then I feed this output sequence files to my map-reduce program to convert it into
Avro format.

Problem in Step 1 : 

My Map-Reduce program expects <ChukwaRecordKey, ChukwaRecord> as Input for further processing.

Now the final sequence files which I am getting is of format <ChukwaArchiveKey, ChunkImpl>
generated in Final Archives folder. 
But instead of this I want my final output as of format :<ChukwaRecordKey, ChukwaRecord>
generated by demuxer to do further processing.

Please correct me if my understanding is not right. Logs will be archived first and then output
of Archiver will be input to Demuxer to generate final output. According to this I must get
<ChukwaRecordKey, ChukwaRecord> as a final file.

Is there any configuration settings to be done on Chukwa side to achive desired output.

No problem in Step 2.

What should be the correct behavior of this whole process. Any pointers regarding this would
be helpful

Thanks in advance,
Stuti

-----Original Message-----
From: Jerome Boulon [mailto:jboulon@netflix.com] 
Sent: Monday, May 17, 2010 10:09 PM
To: chukwa-user@hadoop.apache.org
Subject: Re: Problem in Output file format in FinalArchives

Hi Stuti,

There's 2 output in Chukwa.
1- Collectors are writing SeqFile in this format: <ChukwaArchiveKey and
ChunkImpl>
1.1- Archives are in the same format format: <ChukwaArchiveKey and
ChunkImpl>
2- Demux output is in this format:<ChukwaRecordKey, ChukwaRecord>

So if you want to have your Demux output in Avro format then you need to
have your own AvroOutputFormat in Demux.
I've already done some work to be able to use any Hadoop output format at
the demux level by I haven't publish my code yet.

What is your time range?

/Jerome.


On 5/16/10 9:58 PM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in> wrote:

> 
> 
> Hello Guys,
> 
> 
> I am a newbie to chukwa and I am trying to convert the chukwa sequence file
> produced by the demuxer(<ChukwaRecordKey, ChukwaRecord>) file format  to avro
> format. Currently I am using Chukwa 0.3.0
> 
> Could setup and run chukwa successfully on a ubuntu machine, the agent and
> collector were started successfully and files were created in  finalArchives
> folder.
> 
> The output format of the files in FinalArchives is of type <ChukwaArchiveKey
> and ChunkImpl> but according to the chukwa document and my findings I think
> that files should be of format <ChukwaRecordKey, ChukwaRecord>.
> 
> I used  chukwa-data-processors.sh to start the dataprocessor and change the
> chukwa-demux-conf.xml property to Stream.
> <property>
> <name>archive.grouper</name>
> <value>Stream</value>
> <description>How to group archive files. Choices are Hourly, Daily, DataType,
> and Stream.</description>
> </property>
> 
>   I looked into the source code of demux.java which takes <ChukwaArchiveKey
> and ChunkImpl> (o/p of archiver) as input and gives <ChukwaRecordKey,
> ChukwaRecord> as output. But I am not sure why it is not happening in my case.
> 
> I want to feed my final output to the MetricDataLoader class which takes
> <ChukwaRecordKey, ChukwaRecord> as Input, Please let me know if I am missing
> something here.
> 
> What should be the correct behavior of this whole process. Any pointers
> regarding this would be helpful
> 
> 
> Thanks in advance,
> 
> Stuti
> 
> 
> 
> 
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the
> property of Persistent Systems Ltd. It is intended only for the use of the
> individual or entity to which it is addressed. If you are not the intended
> recipient, you are not authorized to read, retain, copy, print, distribute or
> use this message. If you have received this communication in error, please
> notify the sender and delete all copies of this message. Persistent Systems
> Ltd. does not accept any liability for virus infected mails.
> 


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.

Mime
View raw message