hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Serialization format for structured data
Date Tue, 27 May 2008 09:55:07 GMT
Ted Dunning wrote:
> I asked because the IBM guys have some code to be released soon that will
> address some of these issues.
> In particular, Kevin Breyer was at the HUG and we talked a bit about their
> format.  Some important characteristics are:
> - it is pretty much a binary JSON implementation
> - value parsing doesn't happen because it is a binary format.  That means
> that integers, doubles and dates can be read much more quickly.
> - names are repeated in the data just as in JSON, but are easily compressed
> away with any compression technique you care to use.
> - he says it is fast, but I haven't tried it yet.
> So this will address the first point very directly, will indirectly address
> the second point, and the third point may be FUD.
> Relative to thrift, JSON has the advantage of not requiring a schema as well
> as the disadvantage of not having a schema.  The advantage is that the data
> is more fluid and I don't have to generate code to handle the records.  The
> disadvantage is that I lose some data completeness and typing guarantees.
> On balance, I would like to use JSON-like data quite a bit in ad hoc data
> streams and in logs where the producer and consumer of the data are not
> visible to parts of the data processing chain.

For some reason I'm reminded of all the binary-SOAP-fixes-your-problems 
stories. I will wait and see what comes out, but given my experiences of 
SOAP, I'm already nervous. I just hope we dont see WSDL-for-JSON or 
something similar.

View raw message