hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset
Date Sun, 16 Jan 2011 22:57:53 GMT
On Fri, Jan 14, 2011 at 10:25 AM, Eric Baldeschwieler
<eric14@yahoo-inc.com> wrote:
> 2) append is hard. It is so hard we rewrote the entire write pipeline (5 person-years
work) in trunk after giving up on the codeline you are suggesting we merge in. That work is
what distinguishes all post 20 releases from 20 releases in my mind. I dont trust the 20 append
code line. We've been hurt badly by it.  We did the rewrite only after losing a bunch of
production data a bunch of times with the previous code line.  I think the various 20 append
patch lines may be fine for specialized hbase clusters, but they doesn't have the rigor behind
them to bet your business in them.


A few comments on the above:

+ Append has had a bunch of work done on it since the Y! dataloss of a
few years ago on an ancestor of the branch-0.20-append codebase (IIRC
the issue you refer to in particular -- the 'dataloss' because
partially written blocks were done up in tmp dirs, and on cluster
restart, tmp data was cleared -- has been fixed in
+ You may not trust 0.20-append (or its close cousin over in CDH) but
a bunch of HBasers do. On the one hand, we have little choice.  Until
the *new* append becomes available in a stable Hadoop the HBase
project has had to sustain itself (What you think?, 3-6 months before
we see 0.22?  HBase project can't hold its breath that long).  On
other hand, the branch-0.20-append work has been carried out by lads
(and lasses!) who know their HDFS.  Its true that it will not have
been tested with Y! rigor but near-derivatives -- CDH or the FB
branches -- already do HDFS-200-based append in production.

P.S. Don't get me wrong.  HBase is looking forward to *new* append.
We just need something to suck on meantime.

View raw message