hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal
Date Fri, 02 Sep 2011 19:09:50 GMT
Hey folks,

<wearing my Todd hat and not my Cloudera hat!>

I've been in touch with this team for the last 18 months or so.
They're good people, smart, and have a healthy respect for HBase and
our team. Though they haven't contributed code or participated on the
lists, I can vouch that they do follow our development and generally
do understand HBase as well as what makes their system different. In
the context of the incubator proposal, they're trying to explain why
their system is different than HBase, and not trying to knock our
project. They do borrow our ideas, and in the future we'll be able to
borrow some of theirs. Iterator trees, for example, are distinct from
coprocessors and have some really nice capabilities which I'm looking
forward to adapting into HBase.

There are a couple things to keep in mind about the story here:
- they first evaluated HBase 3 years ago. HBase at that point was not
usable for their application - I think several of us here remember the
state of HBase at the time and might have made the same decision. So,
they started their own project with an internal team of 5-6 people.
- contributing to open source from within the NSA is not easy, for
obvious reasons. They've jumped through many hoops to open source
this, and we should be thankful for that. Now that they're out in open
source land, I think we'll see them collaborating with us much more

I for one look forward to working with these folks, and maybe merging
the projects some time down the road as the feature lists converge.


On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <ghelmling@gmail.com> wrote:
> Some comments on the proposal and differentiation vs HBase:
> Access Labels:
> The proposal claims that this is "unlikely to be adopted [in HBase]".  This
> is completely untrue.  This has been discussed many times in the past in
> relation to our security implementation.  It's just been deferred at the
> moment due to a need to focus on the initial implementation.  But it's
> certainly viewed as a potentially important feature for a future iteration.
> Contributions always welcome!
> see HBASE-3435: Provide per-column-qualifier and per-key-value security for
> HBASE-3025
> Iterators:
> What do these provide that RegionObservers don't?  I'm speculating since the
> proposal provides little in the way of details, but if these are "unlikely
> to be adopted" it's only because coprocessors already offer more extensive
> functionality.
> "Flexibility" aka online schema changes and locality groups
> Locality groups seem to be the only meaningful differentiation in this
> entire comparison.
> Testing
> Performance under "some configurations and conditions" and unsubstantiated
> "greater data integrity" is not meaningful differentiation.
> Apache Brand
> Claims a relationship with HBase.  Is there overlapping code or is this just
> the duplication of functionality?  There's no community relationship that
> I'm aware of.  I haven't seen any of the proposed committers on the HBase
> user and dev lists to this point, so that doesn't set much of a precedent
> for community interaction.
> Overall I see no meaningful differentiation vs HBase as an existing project,
> no past attempts to interact with the most relevant Apache community, and
> only an, until now, private "community" of government users.  I think it's
> great that they want to open source this.  I don't want to discourage that
> -- go for it!  But I don't see what the benefit is of ASF incubating this.
> I only see the potential for community fragmentation and market confusion
> over such closely similar projects.
> Gary
> On Fri, Sep 2, 2011 at 11:06 AM, Stack <stack@duboce.net> wrote:
>> See here for the incubator proposal:
>> http://wiki.apache.org/incubator/AccumuloProposal
>> Reactions probably better belong over on the incubator mailing list
>> but I thought a discussion here first might be useful developing a
>> stance.
>> Initial reaction, not having seen the code, is that it seems to be close to
>> HBase; so close, they call HBase out explicitly in their proposal.
>> The cell based 'access labels' seem like a matter of adding
>> an extra field to KV and their Iterators seem like a specialization on
>> Coprocessors.  The ability to add column families on the fly seems too
>> minor a difference to call out especially if online schema edits are
>> now (soon) supported.  They talk of locality group like functionality
>> too -- that
>> could be a significant difference.  We would have to see the code but at
>> first blush, differences look small.
>> Yet another BT implementation further divides this contended space.
>> If there were to be an effort integrating HBase into Accumulo or vice
>> versa, its likely to distract significantly from project forward motion (If
>> the Accumulo fellows were interested in integrating the two projects,
>> I'd have thought they'd have tried to talk to us before this so thats
>> probably not their intent).
>> On other hand, if their once-secret project is out in the open, we can
>> steal the Apache-licensed good bits and....
>> What do folks think?
>> St.Ack

Todd Lipcon
Software Engineer, Cloudera

View raw message