hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daryn Sharp <da...@yahoo-inc.com>
Subject Re: [DISCUSS] Remove append?
Date Thu, 22 Mar 2012 17:15:56 GMT
On Mar 20, 2012, at 7:37 PM, Eli Collins wrote:
> Hey gang,
> 
> I'd like to get people's thoughts on the following proposal. I think
> we should consider removing append from HDFS.
> 
> Where we are today.. append was added in the 0.17-19 releases
> (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality
> issues. It and sync were re-designed, re-implemented, and shipped in
> 21.0 (HDFS-265). To my knowledge, there has been no real production
> use. Anecdotally people who worked on branch-20-append have told me
> they think the new trunk code is substantially less well-tested than
> the branch-20-append code (at least for sync, append was never well
> tested). It has certainly gotten way less pounding from HBase users.
> The design however, is much improved, and people think we can get
> hsync (and append) stabilized in trunk (mostly testing and bug
> fixing).

Up front:  I think append is a needed feature.

Politely speaking, I think the premise of the question is a bit dubious due to circular nature.
 Ie. It's not used in production so is it worth it?  The stigma/perception that append has
been unstable and is not well-tested is a compelling reason to not be in production at major
installations.  The situation is going to be akin to "You go first. No, you go first!  No
way, you go first!".

Downstream projects also aren't going to use something until it's stable, so they either work
around the limitation, or...  they chose something other hdfs.  There's also the unanswerable
question of how potential users have been silently lost.  We are unlikely to have heard the
user demand from those that chose another solution.  Generally for every complaint/request,
a large N-many people didn't even bother.

I envision a day where hdfs is a performant posix filesystem.  Dropping append sets us back
from that goal.  Admittedly, I don't know all the intricacies of how append was implemented
and why it is/was difficult.  Is the complexity maybe due to "bolting" append onto code that
wasn't designed with mutability in mind?  (That's truly a question, not a statement) If so,
perhaps a refactoring would simplify the code?

Dropping append also might be used as a cudgel against hdfs.  Cynically speaking, do we want
to risk marketeers from certain competitors to say or imply:  Trust your data with us because
we're so brilliant that we have a feature hdfs has repeatedly tried and failed to implement!

Daryn
Mime
View raw message