hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Add Append-HBase support in upcoming 20.205
Date Thu, 01 Sep 2011 06:43:17 GMT
On Wed, Aug 31, 2011 at 3:07 PM,  <Milind.Bhandarkar@emc.com> wrote:
> FWIW, Stack has already done the work needed to make sure that Hbase works
> with Hadoop 0.22 branch, and I suppose if
> https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it
> removes the last blocker from 0.22.0, so that it can be released.

The 0.22 implementation "works" but there are certainly still bugs in it.

If other HDFS committers familiar with the new append could help here,
that would be very much appreciated.

For example, https://issues.apache.org/jira/browse/HDFS-2288 can cause
HBase to fail to recover its WAL during a crash scenario. There are
some others that I'll be likely working through in the coming months.


> I am cc'ng hbase-dev, since this is relevant to them as well.
> - Milind
> On 8/31/11 11:41 AM, "sanjay Radia" <sanjay@hortonworks.com> wrote:
>>I propose that the 20-append patches (details below)  be included in
>>20.205 which will become the first official Apache
>>release of Hadoop that supports Append and HBase.
>>There hasn't been a official Apache release that supports HBase.
>>The HBase community have instead been using the 20-append branch; the
>>patches were contributed by the HBase community including Facebook. The
>>Cloudera distribution has also included these patches.
>>Andrew Purtell has ported these patches to 20-security branch.
>>Risk Level:
>>These patches have been used and tested on large HBase clusters by FB ,
>>by those who use 20-append branch directly (various users including a 500
>>node HBase cluster at Yahoo) and by those that use the Cloudera
>>distribution. We have reviewed the patches and have conducted further
>>tests; testing and validation continues.
>>HDFS-200. Support append and sync for hadoop 0.20 branch.
>>HDFS-142. Blocks that are being written by a client are stored in the
>>blocksBeingWritten directory.
>>HDFS-1057.  Concurrent readers hit ChecksumExceptions if following a
>>writer to very end of file
>>HDFS-724.  Use a bidirectional heartbeat to detect stuck pipeline.
>>HDFS-895. Allow hflush/sync to occur in parallel with new writes to the
>>HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease
>>HDFS-1555. Disallow pipelien recovery if a file is already being lease
>>HDFS-1554. New semantics for recoverLease.
>>HDFS-988. Fix bug where savenameSpace can corrupt edits log.
>>HDFS-826. Allow a mechanism for an application to detect that datanode(s)
>>have died in the write pipeline.
>>HDFS-630. Client can exclude specific nodes in the write pipeline.
>>HDFS-1141. completeFile does not check lease ownership.
>>HDFS-1204. Lease expiration should recover single files, not entire lease
>>HDFS-1254. Support append/sync via the default configuration.
>>HDFS-1346. DFSClient receives out of order packet ack.
>>HDFS-1054. remove sleep before retry for allocating a block.

Todd Lipcon
Software Engineer, Cloudera

View raw message