hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pete Wyckoff <pwyck...@facebook.com>
Subject Re: Serialization with additional schema info
Date Thu, 04 Sep 2008 18:46:17 GMT

I'll just give another plug for Thrift's TRecordStream which has fixed sized
frames that can be optionally compressed or checksummed; since the frames
are fixed sized, it can be split on frame boundaries.

You can write whatever data you want with it - it doesn't have to be thrift,
it just takes whatever is written and writes it to a FD or a socket or
whatever.
 
There is the issue of spill over between frames just like the sequence file
case.

-- pete


On 9/4/08 11:32 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:

> On Thu, Sep 4, 2008 at 10:51 AM, Owen O'Malley <omalley@apache.org> wrote:
> 
>> ...
>> It is also not splittable. It would be really nice to have a codec that was
>> similar in compression/cpu cost to gzip that was splittable.
>> 
> 
> Indeed.
> 
> What happened to the effort to build a splittable gzip codec by inserting
> dummy compression resets with a known pattern?
> 


Mime
View raw message