Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 312867AD2 for ; Fri, 2 Sep 2011 19:30:52 +0000 (UTC) Received: (qmail 34260 invoked by uid 500); 2 Sep 2011 19:30:51 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 34220 invoked by uid 500); 2 Sep 2011 19:30:50 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 34212 invoked by uid 99); 2 Sep 2011 19:30:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 19:30:50 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of joey@cloudera.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 19:30:44 +0000 Received: by fxg9 with SMTP id 9so2601787fxg.14 for ; Fri, 02 Sep 2011 12:30:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.14.144 with SMTP id g16mr205021faa.47.1314991823690; Fri, 02 Sep 2011 12:30:23 -0700 (PDT) Received: by 10.223.93.142 with HTTP; Fri, 2 Sep 2011 12:30:23 -0700 (PDT) In-Reply-To: References: Date: Fri, 2 Sep 2011 15:30:23 -0400 Message-ID: Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal From: Joey Echeverria To: dev@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org To add to what Todd said, I actually worked with those guys for the last 3 years and have used Accumulo in production. It's true that it would have been better if they had been able to contribute to HBase rather than go on their own, but it's not easy to contribute to open source, either officially or unofficially when you work at NSA. I think there is precedence for competing and/or "duplicate" Apache projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly interested in this project setting a precedent for other work at NSA to be developed as open source. -Joey On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon wrote: > Hey folks, > > > > I've been in touch with this team for the last 18 months or so. > They're good people, smart, and have a healthy respect for HBase and > our team. Though they haven't contributed code or participated on the > lists, I can vouch that they do follow our development and generally > do understand HBase as well as what makes their system different. In > the context of the incubator proposal, they're trying to explain why > their system is different than HBase, and not trying to knock our > project. They do borrow our ideas, and in the future we'll be able to > borrow some of theirs. Iterator trees, for example, are distinct from > coprocessors and have some really nice capabilities which I'm looking > forward to adapting into HBase. > > There are a couple things to keep in mind about the story here: > - they first evaluated HBase 3 years ago. HBase at that point was not > usable for their application - I think several of us here remember the > state of HBase at the time and might have made the same decision. So, > they started their own project with an internal team of 5-6 people. > - contributing to open source from within the NSA is not easy, for > obvious reasons. They've jumped through many hoops to open source > this, and we should be thankful for that. Now that they're out in open > source land, I think we'll see them collaborating with us much more > openly. > > I for one look forward to working with these folks, and maybe merging > the projects some time down the road as the feature lists converge. > > -Todd > > On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling wrot= e: >> Some comments on the proposal and differentiation vs HBase: >> >> Access Labels: >> >> The proposal claims that this is "unlikely to be adopted [in HBase]". = =A0This >> is completely untrue. =A0This has been discussed many times in the past = in >> relation to our security implementation. =A0It's just been deferred at t= he >> moment due to a need to focus on the initial implementation. =A0But it's >> certainly viewed as a potentially important feature for a future iterati= on. >> Contributions always welcome! >> >> see HBASE-3435: Provide per-column-qualifier and per-key-value security = for >> HBASE-3025 >> >> >> Iterators: >> >> What do these provide that RegionObservers don't? =A0I'm speculating sin= ce the >> proposal provides little in the way of details, but if these are "unlike= ly >> to be adopted" it's only because coprocessors already offer more extensi= ve >> functionality. >> >> >> "Flexibility" aka online schema changes and locality groups >> >> Locality groups seem to be the only meaningful differentiation in this >> entire comparison. >> >> >> Testing >> >> Performance under "some configurations and conditions" and unsubstantiat= ed >> "greater data integrity" is not meaningful differentiation. >> >> >> Apache Brand >> >> Claims a relationship with HBase. =A0Is there overlapping code or is thi= s just >> the duplication of functionality? =A0There's no community relationship t= hat >> I'm aware of. =A0I haven't seen any of the proposed committers on the HB= ase >> user and dev lists to this point, so that doesn't set much of a preceden= t >> for community interaction. >> >> >> Overall I see no meaningful differentiation vs HBase as an existing proj= ect, >> no past attempts to interact with the most relevant Apache community, an= d >> only an, until now, private "community" of government users. =A0I think = it's >> great that they want to open source this. =A0I don't want to discourage = that >> -- go for it! =A0But I don't see what the benefit is of ASF incubating t= his. >> I only see the potential for community fragmentation and market confusio= n >> over such closely similar projects. >> >> >> Gary >> >> >> On Fri, Sep 2, 2011 at 11:06 AM, Stack wrote: >> >>> See here for the incubator proposal: >>> http://wiki.apache.org/incubator/AccumuloProposal >>> >>> Reactions probably better belong over on the incubator mailing list >>> but I thought a discussion here first might be useful developing a >>> stance. >>> >>> Initial reaction, not having seen the code, is that it seems to be clos= e to >>> HBase; so close, they call HBase out explicitly in their proposal. >>> >>> The cell based 'access labels' seem like a matter of adding >>> an extra field to KV and their Iterators seem like a specialization on >>> Coprocessors. =A0The ability to add column families on the fly seems to= o >>> minor a difference to call out especially if online schema edits are >>> now (soon) supported. =A0They talk of locality group like functionality >>> too -- that >>> could be a significant difference. =A0We would have to see the code but= at >>> first blush, differences look small. >>> >>> Yet another BT implementation further divides this contended space. >>> If there were to be an effort integrating HBase into Accumulo or vice >>> versa, its likely to distract significantly from project forward motion= (If >>> the Accumulo fellows were interested in integrating the two projects, >>> I'd have thought they'd have tried to talk to us before this so thats >>> probably not their intent). >>> >>> On other hand, if their once-secret project is out in the open, we can >>> steal the Apache-licensed good bits and.... >>> >>> What do folks think? >>> >>> St.Ack >>> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > --=20 Joseph Echeverria Cloudera, Inc. 443.305.9434