incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <Stuti_Awas...@persistent.co.in>
Subject RE: Problem in Output file format in FinalArchives
Date Wed, 19 May 2010 15:17:48 GMT
Hi,

As you have suggested I have rechecked my whole process. The process flow is like this :

/chukwa/
   archivesProcessing/
   dataSinkArchives/
   demuxProcessing/
   finalArchives/
   logs/
   postProcess/
   repos/
   rolling/
   temp/

1) Collectors write chunks to logs/*.chukwa files and it close chunks and rename them to logs/*.done

2) DemuxManager checks for *.done and moves files in place to demuxProcessing/mrInput and
then to    dataSinkArchives/[yyyyMMdd]/*/*.done

   At this point I stopped my data-processor and checked the .done file from dataSinkArchives
directory. Surprisingly I still got the output of the format <ChukwaArchiveKey, ChunkImpl>.

As in this case I have not run Archiver

I think I am looking into the correct directory. Please help if I am proceeding in wrong direction.

Thanks

-----Original Message-----
From: Jerome Boulon [mailto:jboulon@netflix.com] 
Sent: Tuesday, May 18, 2010 10:48 PM
To: chukwa-user@hadoop.apache.org
Subject: Re: Problem in Output file format in FinalArchives

Hi,
In Chukwa there's 3 background jobs that are running in parallel.
If you have <ChukwaArchiveKey, ChunkImpl>, it's because you're taking output
from the archiver or input to demux. You should take demux output instead.
Archiver is running in parallel of Demux so you are probably looking at the
wrong directory.


Also, there's a bigger issue in having a second M/R to convert Demux output
to Avro. You'll have a re-read the data twice for nothing and the since you
are not using the demux output by say then you'll have to delete it.

In Honu, I fix this by having a generic output format and for example I can
output in Text or SeqFile using Hive specific serialization and for Avro you
should do the same. In Honu, I can define the tabke schema that I want per
dataType then Demux will do everything automatically.

Can you open a Jira for that and/or send me detail information on how you
are outputting to Avro? Do you have an Avro output format? What kind of
schema do you support?

Regards,
  /Jerome.

On 5/17/10 10:20 PM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in> wrote:

> Thanks Jerome,
> 
> Actually my use case is like this :
> 
> Step 1 : I give my logs to chukwa and it process the logs to generate the
> sequence files.
> 
> Step 2 : Then I feed this output sequence files to my map-reduce program to
> convert it into Avro format.
> 
> Problem in Step 1 :
> 
> My Map-Reduce program expects <ChukwaRecordKey, ChukwaRecord> as Input for
> further processing.
> 
> Now the final sequence files which I am getting is of format
> <ChukwaArchiveKey, ChunkImpl> generated in Final Archives folder.
> But instead of this I want my final output as of format :<ChukwaRecordKey,
> ChukwaRecord> generated by demuxer to do further processing.
> 
> Please correct me if my understanding is not right. Logs will be archived
> first and then output of Archiver will be input to Demuxer to generate final
> output. According to this I must get <ChukwaRecordKey, ChukwaRecord> as a
> final file.
> 
> Is there any configuration settings to be done on Chukwa side to achive
> desired output.
> 
> No problem in Step 2.
> 
> What should be the correct behavior of this whole process. Any pointers
> regarding this would be helpful
> 
> Thanks in advance,
> Stuti
> 
> -----Original Message-----
> From: Jerome Boulon [mailto:jboulon@netflix.com]
> Sent: Monday, May 17, 2010 10:09 PM
> To: chukwa-user@hadoop.apache.org
> Subject: Re: Problem in Output file format in FinalArchives
> 
> Hi Stuti,
> 
> There's 2 output in Chukwa.
> 1- Collectors are writing SeqFile in this format: <ChukwaArchiveKey and
> ChunkImpl>
> 1.1- Archives are in the same format format: <ChukwaArchiveKey and
> ChunkImpl>
> 2- Demux output is in this format:<ChukwaRecordKey, ChukwaRecord>
> 
> So if you want to have your Demux output in Avro format then you need to
> have your own AvroOutputFormat in Demux.
> I've already done some work to be able to use any Hadoop output format at
> the demux level by I haven't publish my code yet.
> 
> What is your time range?
> 
> /Jerome.
> 
> 
> On 5/16/10 9:58 PM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in> wrote:
> 
>> 
>> 
>> Hello Guys,
>> 
>> 
>> I am a newbie to chukwa and I am trying to convert the chukwa sequence file
>> produced by the demuxer(<ChukwaRecordKey, ChukwaRecord>) file format  to avro
>> format. Currently I am using Chukwa 0.3.0
>> 
>> Could setup and run chukwa successfully on a ubuntu machine, the agent and
>> collector were started successfully and files were created in  finalArchives
>> folder.
>> 
>> The output format of the files in FinalArchives is of type <ChukwaArchiveKey
>> and ChunkImpl> but according to the chukwa document and my findings I think
>> that files should be of format <ChukwaRecordKey, ChukwaRecord>.
>> 
>> I used  chukwa-data-processors.sh to start the dataprocessor and change the
>> chukwa-demux-conf.xml property to Stream.
>> <property>
>> <name>archive.grouper</name>
>> <value>Stream</value>
>> <description>How to group archive files. Choices are Hourly, Daily, DataType,
>> and Stream.</description>
>> </property>
>> 
>>   I looked into the source code of demux.java which takes <ChukwaArchiveKey
>> and ChunkImpl> (o/p of archiver) as input and gives <ChukwaRecordKey,
>> ChukwaRecord> as output. But I am not sure why it is not happening in my
>> case.
>> 
>> I want to feed my final output to the MetricDataLoader class which takes
>> <ChukwaRecordKey, ChukwaRecord> as Input, Please let me know if I am missing
>> something here.
>> 
>> What should be the correct behavior of this whole process. Any pointers
>> regarding this would be helpful
>> 
>> 
>> Thanks in advance,
>> 
>> Stuti
>> 
>> 
>> 
>> 
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the
>> property of Persistent Systems Ltd. It is intended only for the use of the
>> individual or entity to which it is addressed. If you are not the intended
>> recipient, you are not authorized to read, retain, copy, print, distribute or
>> use this message. If you have received this communication in error, please
>> notify the sender and delete all copies of this message. Persistent Systems
>> Ltd. does not accept any liability for virus infected mails.
>> 
> 
> 
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the
> property of Persistent Systems Ltd. It is intended only for the use of the
> individual or entity to which it is addressed. If you are not the intended
> recipient, you are not authorized to read, retain, copy, print, distribute or
> use this message. If you have received this communication in error, please
> notify the sender and delete all copies of this message. Persistent Systems
> Ltd. does not accept any liability for virus infected mails.
> 


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.

Mime
View raw message