Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 22113 invoked from network); 17 Jan 2011 23:29:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Jan 2011 23:29:49 -0000 Received: (qmail 85964 invoked by uid 500); 17 Jan 2011 23:29:48 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 85909 invoked by uid 500); 17 Jan 2011 23:29:47 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 85901 invoked by uid 99); 17 Jan 2011 23:29:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Jan 2011 23:29:47 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Jan 2011 23:29:41 +0000 Received: from SP2-EX07CAS01.ds.corp.yahoo.com (sp2-ex07cas01.corp.sp2.yahoo.com [98.137.59.37]) by mrout3.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id p0HNTEVu007585 for ; Mon, 17 Jan 2011 15:29:14 -0800 (PST) Received: from [10.0.1.4] (10.72.244.133) by SP2-EX07CAS01.ds.corp.yahoo.com (98.137.59.4) with Microsoft SMTP Server (TLS) id 8.3.106.1; Mon, 17 Jan 2011 15:29:13 -0800 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 (Apple Message framework v1082) Subject: Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset From: Eric Baldeschwieler In-Reply-To: Date: Mon, 17 Jan 2011 15:29:10 -0800 Content-Transfer-Encoding: quoted-printable Message-ID: <1BBA9772-9F30-44A1-A979-387D7891903E@yahoo-inc.com> References: <516684F5-0052-4381-805D-760B61DECB16@yahoo-inc.com> <366A9E58-5BD7-497D-9AE1-229959ED4065@apache.org> <18C5C999-4680-4684-BC55-A430C40FD746@yahoo-inc.com> <2A07F1E6-7096-493B-B92E-89938689DD50@yahoo-inc.com> <5CDDF962-5828-459F-87C3-5033EC21E9BF@mac.com> <075308A1-129B-4BF7-8924-C04EC6106D3E@yahoo-inc.com> <388582DF-FC85-49D1-A89C-1F36CE34A0E2@yahoo-inc.com> <04705B3C-49A9-46B3-8AA9-5673EFBDE544@yahoo-inc.com> <74BDFA74-DB12-4109-89DF-B353FC7296C4@yahoo-inc.com> <00F14279-2802-4CC3-91AC-481AE257FD8B@yahoo-inc.com> <58FF3525-B734-44A0-A5C6-45282A23F06F@yahoo-inc.com> <7D4397E2-B618-4BD1-8E1E-08B1C598B76F@yahoo-inc.com> <10BA9684-51FF-4332-A2FD-5F648E0AAF8C@Holsman.NET> <4E3C214A-16C6-46FF-908B-8EFB075A24E0@yahoo-inc.com> To: "general@hadoop.apache.org" X-Mailer: Apple Mail (2.1082) Hi Stack, I feel your pain. We're running a 700 node HBASE cluster containing a = HUGE collections of all web pages. Both versions of append were started = by engineers working at yahoo and we've put A LOT of investment into = both. I really, really want to see the append issue solved for HBASE!! =20= My point is simply that we need to separate our concerns. I would 300% = support a community of folks building a 0.20 derived version of hadoop = with append and we know that any new release post 0.20 will contain an = append solution. This branch is more backwards facing. We are simply = trying to share our last two years of 0.20 experience with the = community, so that a) folks can use it if they find value in it, b) this = work can be merged into future hadoop releases (that will have append). We want to share what we have tested, since we believe that the testing = is a good chunk of our contribution. Thanks, E14 On Jan 16, 2011, at 2:57 PM, Stack wrote: > On Fri, Jan 14, 2011 at 10:25 AM, Eric Baldeschwieler > wrote: >> 2) append is hard. It is so hard we rewrote the entire write pipeline = (5 person-years work) in trunk after giving up on the codeline you are = suggesting we merge in. That work is what distinguishes all post 20 = releases from 20 releases in my mind. I dont trust the 20 append code = line. We've been hurt badly by it. We did the rewrite only after losing = a bunch of production data a bunch of times with the previous code line. = I think the various 20 append patch lines may be fine for specialized = hbase clusters, but they doesn't have the rigor behind them to bet your = business in them. >>=20 >=20 > Eric: >=20 > A few comments on the above: >=20 > + Append has had a bunch of work done on it since the Y! dataloss of a > few years ago on an ancestor of the branch-0.20-append codebase (IIRC > the issue you refer to in particular -- the 'dataloss' because > partially written blocks were done up in tmp dirs, and on cluster > restart, tmp data was cleared -- has been fixed in > branch-0.20.append). > + You may not trust 0.20-append (or its close cousin over in CDH) but > a bunch of HBasers do. On the one hand, we have little choice. Until > the *new* append becomes available in a stable Hadoop the HBase > project has had to sustain itself (What you think?, 3-6 months before > we see 0.22? HBase project can't hold its breath that long). On > other hand, the branch-0.20-append work has been carried out by lads > (and lasses!) who know their HDFS. Its true that it will not have > been tested with Y! rigor but near-derivatives -- CDH or the FB > branches -- already do HDFS-200-based append in production. >=20 > St.Ack > P.S. Don't get me wrong. HBase is looking forward to *new* append. > We just need something to suck on meantime.