hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [VOTE] Direction for Hadoop development
Date Wed, 08 Dec 2010 19:20:44 GMT
On 12/07/2010 10:25 AM, Owen O'Malley wrote:
> The new code reads the new or old versions of SequenceFile seamlessly
> using auto-detection of the version. The old code fails with an explicit
> message saying that it can't read this version. This is the only
> mechanism available when upgrading a file format with a single version
> number and is the mechanism that we've used 6 times in the past.

The last such change was nearly four years ago, in:

https://issues.apache.org/jira/browse/HADOOP-732

The quantity of data stored in SequenceFiles has greatly increased over 
the past four years.  The project's concern for compatibility has also 
correspondingly increased over that time.

The new format version might not be written when folks are using 
Writable or some other serialization currently supported by 
SequenceFile.  The only situation in your patch where the new version is 
required is for Avro.  You might simply drop support for Avro and leave 
the file version number alone since Avro already includes a container 
file format.  Or you might only use the new format version for 
non-class-determined serializations like Avro.  Or you might use 
SequenceFile's existing metadata for non-class-determined serializations 
like Avro and leave the file version number alone.

Doug

Mime
View raw message