hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Add Append-HBase support in upcoming 20.205
Date Fri, 02 Sep 2011 04:08:52 GMT

I'm biased.  And if we were adding any other feature but sync/append
in a minor release on 0.20 I'd be praising the work but not voting for
its inclusion.  So I'm a hypocrite too.... but I can't help myself.


P.S. Below is the hbase projects' 'official' story on the version of
hadoop users can run against our current stable offering.  Its from
our 'manual' up on the hbase website.  It might look to you like a
mess but the text is actually hard-won after feedback from folks who
have had to navigate its intricacies

"2.3. Hadoop
This version of HBase will only run on Hadoop 0.20.x. It will not run
on hadoop 0.21.x (nor 0.22.x). HBase will lose data unless it is
running on an HDFS that has a durable sync. Hadoop 0.20.2 and Hadoop DO NOT have this attribute. Currently only the
branch-0.20-append branch has this a working sync[5]. No official
releases have been made from the branch-0.20-append branch up to now
so you will have to build your own Hadoop from the tip of this branch.
Michael Noll has written a detailed blog, Building an Hadoop 0.20.x
version for HBase 0.90.2, on how to build an Hadoop from
branch-0.20-append. Recommended [6].

Or rather than build your own, you could use the Cloudera or MapR

See http://hbase.apache.org/book.html#hadoop

On Wed, Aug 31, 2011 at 11:41 AM, sanjay Radia <sanjay@hortonworks.com> wrote:
> I propose that the 20-append patches (details below)  be included in 20.205 which will
become the first official Apache
> release of Hadoop that supports Append and HBase.
> Background:
> There hasn't been a official Apache release that supports HBase.
> The HBase community have instead been using the 20-append branch; the patches were contributed
by the HBase community including Facebook. The Cloudera distribution has also included these
> Andrew Purtell has ported these patches to 20-security branch.
> Risk Level:
> These patches have been used and tested on large HBase clusters by FB , by those who
use 20-append branch directly (various users including a 500 node HBase cluster at Yahoo)
and by those that use the Cloudera distribution. We have reviewed the patches and have conducted
further tests; testing and validation continues.
> Patches:
> HDFS-200. Support append and sync for hadoop 0.20 branch.
> HDFS-142. Blocks that are being written by a client are stored in the blocksBeingWritten
> HDFS-1057.  Concurrent readers hit ChecksumExceptions if following a writer to very
end of file
> HDFS-724.  Use a bidirectional heartbeat to detect stuck pipeline.
> HDFS-895. Allow hflush/sync to occur in parallel with new writes to the file.
> HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease recovery.
> HDFS-1555. Disallow pipelien recovery if a file is already being lease recovered.
> HDFS-1554. New semantics for recoverLease.
> HDFS-988. Fix bug where savenameSpace can corrupt edits log.
> HDFS-826. Allow a mechanism for an application to detect that datanode(s) have died in
the write pipeline.
> HDFS-630. Client can exclude specific nodes in the write pipeline.
> HDFS-1141. completeFile does not check lease ownership.
> HDFS-1204. Lease expiration should recover single files, not entire lease holder
> HDFS-1254. Support append/sync via the default configuration.
> HDFS-1346. DFSClient receives out of order packet ack.
> HDFS-1054. remove sleep before retry for allocating a block.

View raw message