hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Radia <san...@hortonworks.com>
Subject Re: [DISCUSS] Remove append?
Date Wed, 21 Mar 2012 20:57:36 GMT
On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins <eli@cloudera.com> wrote:

> Append introduces non-trivial design and code complexity, which is not
> worth the cost if we don't have real users.

The bulk of the complexity of HDFS-265 ("the new Append") was around
Hflush, concurrent readers, the pipeline etc. The code and complexity  for
appending to previously closed file was not that large.

> Removing append means we
> have the property that HDFS blocks, when finalized, are immutable.
> This significantly simplifies the design and code, which significantly
> simplifies the implementation of other features like snapshots,
> HDFS-level caching, dedupe, etc.

While Snapshots  are challenging with Append, it is solvable - the snapshot
needs to remember the length of the file. (We have a working prototype - we
will posting the design and the code soon).

I agree that the notion of an immutable file is useful since it lets the
system and tools optimize certain things.  A xerox-parc file system in the
80s had this feature that the system exploited. I would support adding the
notion of an immutable file to Hadoop.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message