hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsz Wo Sze <szets...@yahoo.com>
Subject Re: [DISCUSS] Remove append?
Date Tue, 27 Mar 2012 02:46:12 GMT
Hi Colin,

Please feel free to file JIRAs if you see unit test failures.

Let's continue the immutable file discussion on HDFS-3154.

Nicholas




________________________________
 From: Colin McCabe <cmccabe@alumni.cmu.edu>
To: hdfs-dev@hadoop.apache.org; Tsz Wo Sze <szetszwo@yahoo.com> 
Sent: Monday, March 26, 2012 2:31 PM
Subject: Re: [DISCUSS] Remove append?
 
On Mon, Mar 26, 2012 at 1:55 PM, Tsz Wo Sze <szetszwo@yahoo.com> wrote:
>> Just one comment: If we do decide to keep append in, we should get it
>> to be actually stable and usable.  In my opinion, this should
>> definitely happen before adding any new operations.
>
> @Colin, append is currently stable and, of course, usable.  Many people in different
organizations have tested it
> in small and large scale.  However, it is not yet in a stable release and so it is not
yet heavy used.

The append unit test failed on me recently on Jenkins.  It's possible
that this was due to a Jenkins timeout, or something, but I assumed it
was due to instability at the time.  If it happens again, I'll be sure
to check the backtrace and file a JIRA if needed.

>> I agree that the notion of an immutable file is useful since it lets the
>> system and tools optimize certain things.  A xerox-parc file system in the
>> 80s had this feature that the system exploited. I would support adding the
>> notion of an immutable file to Hadoop.

I think Eli was hoping that making files immutable would make the
system simpler, and hopefully, less buggy.  You won't get that benefit
if only certain files are immutable.  In fact, quite the contrary--
you'll just be adding more complexity.

I'd also like to see what the "certain things" are that having certain
files, but not others, be immutable would allow you to optimize.  The
thread you linked to from the JIRA has no information on this.

I am aware of at least two "filesystems" (in the loose sense of the
word) that have immutable files.  One is Venti from Plan9, and the
other is git, by Linus Torvalds.  Both of them are significantly
simpler because of their invariant that files cannot change.  However,
both of them are append-only, meaning that files can never be deleted.
This seems unsuitable for the HDFS use case, and in fact, I see no
reason to believe that having some, but not all, files be immutable
would provide any benefit.

Feel free to prove me wrong if you think of something, though!

cheers,
Colin


>
> @Sanjay, I filed HDFS-3154.
>
> @Eli and others, it turns out that the discussion is very useful!  Thanks.
>
> Nicholas
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message