incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerome Boulon <jbou...@netflix.com>
Subject Re: Problem in Output file format in FinalArchives
Date Tue, 18 May 2010 17:18:29 GMT
Hi,
In Chukwa there's 3 background jobs that are running in parallel.
If you have <ChukwaArchiveKey, ChunkImpl>, it's because you're taking output
from the archiver or input to demux. You should take demux output instead.
Archiver is running in parallel of Demux so you are probably looking at the
wrong directory.


Also, there's a bigger issue in having a second M/R to convert Demux output
to Avro. You'll have a re-read the data twice for nothing and the since you
are not using the demux output by say then you'll have to delete it.

In Honu, I fix this by having a generic output format and for example I can
output in Text or SeqFile using Hive specific serialization and for Avro you
should do the same. In Honu, I can define the tabke schema that I want per
dataType then Demux will do everything automatically.

Can you open a Jira for that and/or send me detail information on how you
are outputting to Avro? Do you have an Avro output format? What kind of
schema do you support?

Regards,
  /Jerome.

On 5/17/10 10:20 PM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in> wrote:

> Thanks Jerome,
> 
> Actually my use case is like this :
> 
> Step 1 : I give my logs to chukwa and it process the logs to generate the
> sequence files.
> 
> Step 2 : Then I feed this output sequence files to my map-reduce program to
> convert it into Avro format.
> 
> Problem in Step 1 :
> 
> My Map-Reduce program expects <ChukwaRecordKey, ChukwaRecord> as Input for
> further processing.
> 
> Now the final sequence files which I am getting is of format
> <ChukwaArchiveKey, ChunkImpl> generated in Final Archives folder.
> But instead of this I want my final output as of format :<ChukwaRecordKey,
> ChukwaRecord> generated by demuxer to do further processing.
> 
> Please correct me if my understanding is not right. Logs will be archived
> first and then output of Archiver will be input to Demuxer to generate final
> output. According to this I must get <ChukwaRecordKey, ChukwaRecord> as a
> final file.
> 
> Is there any configuration settings to be done on Chukwa side to achive
> desired output.
> 
> No problem in Step 2.
> 
> What should be the correct behavior of this whole process. Any pointers
> regarding this would be helpful
> 
> Thanks in advance,
> Stuti
> 
> -----Original Message-----
> From: Jerome Boulon [mailto:jboulon@netflix.com]
> Sent: Monday, May 17, 2010 10:09 PM
> To: chukwa-user@hadoop.apache.org
> Subject: Re: Problem in Output file format in FinalArchives
> 
> Hi Stuti,
> 
> There's 2 output in Chukwa.
> 1- Collectors are writing SeqFile in this format: <ChukwaArchiveKey and
> ChunkImpl>
> 1.1- Archives are in the same format format: <ChukwaArchiveKey and
> ChunkImpl>
> 2- Demux output is in this format:<ChukwaRecordKey, ChukwaRecord>
> 
> So if you want to have your Demux output in Avro format then you need to
> have your own AvroOutputFormat in Demux.
> I've already done some work to be able to use any Hadoop output format at
> the demux level by I haven't publish my code yet.
> 
> What is your time range?
> 
> /Jerome.
> 
> 
> On 5/16/10 9:58 PM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in> wrote:
> 
>> 
>> 
>> Hello Guys,
>> 
>> 
>> I am a newbie to chukwa and I am trying to convert the chukwa sequence file
>> produced by the demuxer(<ChukwaRecordKey, ChukwaRecord>) file format  to avro
>> format. Currently I am using Chukwa 0.3.0
>> 
>> Could setup and run chukwa successfully on a ubuntu machine, the agent and
>> collector were started successfully and files were created in  finalArchives
>> folder.
>> 
>> The output format of the files in FinalArchives is of type <ChukwaArchiveKey
>> and ChunkImpl> but according to the chukwa document and my findings I think
>> that files should be of format <ChukwaRecordKey, ChukwaRecord>.
>> 
>> I used  chukwa-data-processors.sh to start the dataprocessor and change the
>> chukwa-demux-conf.xml property to Stream.
>> <property>
>> <name>archive.grouper</name>
>> <value>Stream</value>
>> <description>How to group archive files. Choices are Hourly, Daily, DataType,
>> and Stream.</description>
>> </property>
>> 
>>   I looked into the source code of demux.java which takes <ChukwaArchiveKey
>> and ChunkImpl> (o/p of archiver) as input and gives <ChukwaRecordKey,
>> ChukwaRecord> as output. But I am not sure why it is not happening in my
>> case.
>> 
>> I want to feed my final output to the MetricDataLoader class which takes
>> <ChukwaRecordKey, ChukwaRecord> as Input, Please let me know if I am missing
>> something here.
>> 
>> What should be the correct behavior of this whole process. Any pointers
>> regarding this would be helpful
>> 
>> 
>> Thanks in advance,
>> 
>> Stuti
>> 
>> 
>> 
>> 
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the
>> property of Persistent Systems Ltd. It is intended only for the use of the
>> individual or entity to which it is addressed. If you are not the intended
>> recipient, you are not authorized to read, retain, copy, print, distribute or
>> use this message. If you have received this communication in error, please
>> notify the sender and delete all copies of this message. Persistent Systems
>> Ltd. does not accept any liability for virus infected mails.
>> 
> 
> 
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the
> property of Persistent Systems Ltd. It is intended only for the use of the
> individual or entity to which it is addressed. If you are not the intended
> recipient, you are not authorized to read, retain, copy, print, distribute or
> use this message. If you have received this communication in error, please
> notify the sender and delete all copies of this message. Persistent Systems
> Ltd. does not accept any liability for virus infected mails.
> 


Mime
View raw message