incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: Why ChukwaRecord have only Key:Value as Strings
Date Tue, 27 Jul 2010 01:10:45 GMT
Avro can be stored as file as well as serialization for rpc.  Doug Cutting
just gave a presentation at HUG last Wednesday about this, and Avro trunk
has mapreduce code which output avro file format.

Regards,
Eric

On 7/26/10 5:48 PM, "Jerome Boulon" <jboulon@netflix.com> wrote:

> Hi Eric,
> Can you clarify what ³Avro format² means?
> From my understanding Avro is a serialization format not a file format.
> 
> So, are you thinking of a new file format like Tfile/SeqFile/RCFile?
> If yes can you give a pointer to that new file format?
> 
> Thanks,
>   /Jerome.
> 
> 
> On 7/26/10 4:06 PM, "Eric Yang" <eyang@yahoo-inc.com> wrote:
> 
>> 
>> 
>>> > 1)     How will Avro be used with Chukwa?
>> 
>> Avro can improve flexibility for both ChukwaArchiveKey and ChukwaRecord.
>> The current key representation is optimized for time series use case only.
>> Ideally, having more dynamic key meta data also helps channeling of user¹s
>> data.
>> 
>>> > 2)     Does all Chukwa files be in Avro format?
>> 
>> Chukwa files are currently in sequence file format.  It will convert to
>> Avro, if the community vote to do so.  I am using Hbase as my storage sink,
>> hence, my use case doesn¹t apply.
>> 
>>> > 3)     Are there any plans to enhance Chukwa record format?
>> 
>> I haven¹t give much thought about this.  Ideally, ChukwaArchiveKey is avro
>> object with a reserved metadata field, and Record is byte[] which = avro
>> object. We will implemnt a comparator to compare times series plus a couple
>> dimensions.
>> 
>> If Chukwa converts to avro, then you get your use case for free.  However, I
>> am not sure who will be writing the implementation.  If you are interested
>> in writing this, you are welcome to contribute.
>> 
>>> > I have written Adapter and Parser for Multiline Record format. If Chukwa
>>> will
>>> > be using Avro format then I also have to change my code.
>>> > Currently I am processing the log files in Chukwa and converting them to
>>> Avro
>>> > format to keep it in HDFS. If you are planning to include the Avro in the
>>> > Chukwa then does it mean that all the Chukwa files will be in Avro format
?
>> 
>> My data will be in avro in Hbase, and the data is also mirrored to live on
>> in sequence file as String or bytes for the short term.  In the long run,
>> when someone has implemented a more superior format than sequence file and
>> Tfile, then Chukwa community may be interested to move.  This is currently
>> not the top priority.  The performance of plain avro file on hdfs should be
>> faster than sequence file, but we are waiting for Avro 1.4 to age a little
>> bit longer before making the jump.
>> 
>> Regards,
>> Eric
>> 
>>> > Please Suggest
>>> > Stuti
>>> > 
>>> >
>>> > From: Eric Yang [mailto:eyang@yahoo-inc.com]
>>> > Sent: Friday, July 23, 2010 10:06 PM
>>> > To: chukwa-user@hadoop.apache.org
>>> > Cc: Jerome Boulon
>>> > Subject: Re: Why ChukwaRecord have only Key:Value as Strings
>>> > 
>>> > Initially, ChukwaRecord only supports String because it was made to
>>> process
>>> > text log file.  We were naïve to think that we can use JSON for all our
>>> data.
>>> > There is a plan to use Avro instead of supporting generic types when Avro
>>> > mapreduce input/output format is ready next month.  This provides better
>>> meta
>>> > data support inside the data for the processing system.
>>> >
>>> > Regards,
>>> > Eric
>>> >
>>> > On 7/23/10 5:03 AM, "Stuti Awasthi" <Stuti_Awasthi@persistent.co.in>
>>> wrote:
>>> > Hi all,
>>> > 
>>> > I was looking at the code of ChukwaRecord and  found out that it adds only
>>> > <String key & String value >
>>> > 
>>> > Snippet :
>>> > 
>>> > Public class ChukwaRecord extends ChukwaRecordJT implements Record
>>> > {
>>> > Public void add( String Key, String Value )
>>> > }
>>> > 
>>> > I have a scenario in which I want to add the Object as a value i.e <String
>>> Key
>>> > ,Object value> .
>>> > 
>>> > Does chukwa¹s current implementation support that or any patch available?
>>> > 
>>> > Stuti
>>> > 
>>> > 
>>> > Thanks and Regards
>>> > 
>>> > Stuti Awasthi | Software Engineer ­ IBM BU | Persistent Systems Limited
>>> > stuti_awasthi@persistent.co.in <mailto:chandan_avdhut@persistent.co.in>
|
>>> > Tel: +91 (20) 391 77837
>>> >
>>> > 
>>> > DISCLAIMER ========== This e-mail may contain privileged and confidential
>>> > information which is the property of Persistent Systems Ltd. It is
>>> intended
>>> > only for the use of the individual or entity to which it is addressed. If
>>> you
>>> > are not the intended recipient, you are not authorized to read, retain,
>>> copy,
>>> > print, distribute or use this message. If you have received this
>>> communication
>>> > in error, please notify the sender and delete all copies of this message.
>>> > Persistent Systems Ltd. does not accept any liability for virus infected
>>> > mails.
>>> > DISCLAIMER ========== This e-mail may contain privileged and confidential
>>> > information which is the property of Persistent Systems Ltd. It is
>>> intended
>>> > only for the use of the individual or entity to which it is addressed. If
>>> you
>>> > are not the intended recipient, you are not authorized to read, retain,
>>> copy,
>>> > print, distribute or use this message. If you have received this
>>> communication
>>> > in error, please notify the sender and delete all copies of this message.
>>> > Persistent Systems Ltd. does not accept any liability for virus infected
>>> > mails.
>>> >
>> 
>> 
>> 
> 


Mime
View raw message