hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan" <viv...@yahoo-inc.com>
Subject RE: Serialization format for structured data
Date Tue, 27 May 2008 03:18:47 GMT
It's a finished product in the sense that the functionality works and is
robust, and it is being used by some folks. You can find more examples
through the test cases (src\test\org\apache\hadoop\record) and some C++
examples at src\c++\librecordio\test.  

> -----Original Message-----
> From: the.stuart.sierra@gmail.com 
> [mailto:the.stuart.sierra@gmail.com] On Behalf Of Stuart Sierra
> Sent: Tuesday, May 27, 2008 1:06 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Serialization format for structured data
> 
> Thanks, Vivak, I wasn't aware of Record I/O.  For the record 
> (no pun intended), I found the docs here:
> http://hadoop.apache.org/core/docs/r0.17.0/api/org/apache/hado
> op/record/package-summary.html
> 
> I don't recall seeing Record I/O in any tutorials or on the 
> wiki.  Is it a finished product, or more of an experimental 
> sub-project right now?
> 
> -Stuart
> 
> 
> On Sun, May 25, 2008 at 11:12 PM, Vivek Ratan 
> <vivekr@yahoo-inc.com> wrote:
> > Am joining the conversation late, but another option is is Hadoop's 
> > own RecordIO. Like with Thrift, you need to use compiler-generated 
> > stubs to read and write records, but it also supports 
> schemas. You can 
> > de/serialize schemas separately from content, which gives 
> you lots of 
> > flexibility.
> >
> >> -----Original Message-----
> >> From: Bryan Duxbury [mailto:bryan@rapleaf.com]
> >> Sent: Saturday, May 24, 2008 12:13 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: Serialization format for structured data
> >>
> >>
> >> On May 23, 2008, at 9:51 AM, Ted Dunning wrote:
> >> > Relative to thrift, JSON has the advantage of not requiring
> >> a schema
> >> > as well as the disadvantage of not having a schema.  The
> >> advantage is
> >> > that the data is more fluid and I don't have to generate code to 
> >> > handle the records.  The disadvantage is that I lose some data 
> >> > completeness and typing guarantees.
> >> > On balance, I would like to use JSON-like data quite a bit
> >> in ad hoc
> >> > data streams and in logs where the producer and consumer of
> >> the data
> >> > are not visible to parts of the data processing chain.
> >>
> >> That about sums it up. If you want schema, Thrift is your 
> friend. If 
> >> you don't, JSON probably will do pretty well for you.
> >>
> >> -Bryan
> >>
> >
> 

Mime
View raw message