hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: [DISCUSS] Remove append?
Date Mon, 26 Mar 2012 21:31:20 GMT
On Mon, Mar 26, 2012 at 1:55 PM, Tsz Wo Sze <szetszwo@yahoo.com> wrote:
>> Just one comment: If we do decide to keep append in, we should get it
>> to be actually stable and usable.  In my opinion, this should
>> definitely happen before adding any new operations.
> @Colin, append is currently stable and, of course, usable.  Many people in different
organizations have tested it
> in small and large scale.  However, it is not yet in a stable release and so it is not
yet heavy used.

The append unit test failed on me recently on Jenkins.  It's possible
that this was due to a Jenkins timeout, or something, but I assumed it
was due to instability at the time.  If it happens again, I'll be sure
to check the backtrace and file a JIRA if needed.

>> I agree that the notion of an immutable file is useful since it lets the
>> system and tools optimize certain things.  A xerox-parc file system in the
>> 80s had this feature that the system exploited. I would support adding the
>> notion of an immutable file to Hadoop.

I think Eli was hoping that making files immutable would make the
system simpler, and hopefully, less buggy.  You won't get that benefit
if only certain files are immutable.  In fact, quite the contrary--
you'll just be adding more complexity.

I'd also like to see what the "certain things" are that having certain
files, but not others, be immutable would allow you to optimize.  The
thread you linked to from the JIRA has no information on this.

I am aware of at least two "filesystems" (in the loose sense of the
word) that have immutable files.  One is Venti from Plan9, and the
other is git, by Linus Torvalds.  Both of them are significantly
simpler because of their invariant that files cannot change.  However,
both of them are append-only, meaning that files can never be deleted.
 This seems unsuitable for the HDFS use case, and in fact, I see no
reason to believe that having some, but not all, files be immutable
would provide any benefit.

Feel free to prove me wrong if you think of something, though!


> @Sanjay, I filed HDFS-3154.
> @Eli and others, it turns out that the discussion is very useful!  Thanks.
> Nicholas

View raw message