hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal
Date Fri, 02 Sep 2011 19:30:23 GMT
To add to what Todd said, I actually worked with those guys for the
last 3 years and have used Accumulo in production. It's true that it
would have been better if they had been able to contribute to HBase
rather than go on their own, but it's not easy to contribute to open
source, either officially or unofficially when you work at NSA. I
think there is precedence for competing and/or "duplicate" Apache
projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly
interested in this project setting a precedent for other work at NSA
to be developed as open source.

-Joey

On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <todd@cloudera.com> wrote:
> Hey folks,
>
> <wearing my Todd hat and not my Cloudera hat!>
>
> I've been in touch with this team for the last 18 months or so.
> They're good people, smart, and have a healthy respect for HBase and
> our team. Though they haven't contributed code or participated on the
> lists, I can vouch that they do follow our development and generally
> do understand HBase as well as what makes their system different. In
> the context of the incubator proposal, they're trying to explain why
> their system is different than HBase, and not trying to knock our
> project. They do borrow our ideas, and in the future we'll be able to
> borrow some of theirs. Iterator trees, for example, are distinct from
> coprocessors and have some really nice capabilities which I'm looking
> forward to adapting into HBase.
>
> There are a couple things to keep in mind about the story here:
> - they first evaluated HBase 3 years ago. HBase at that point was not
> usable for their application - I think several of us here remember the
> state of HBase at the time and might have made the same decision. So,
> they started their own project with an internal team of 5-6 people.
> - contributing to open source from within the NSA is not easy, for
> obvious reasons. They've jumped through many hoops to open source
> this, and we should be thankful for that. Now that they're out in open
> source land, I think we'll see them collaborating with us much more
> openly.
>
> I for one look forward to working with these folks, and maybe merging
> the projects some time down the road as the feature lists converge.
>
> -Todd
>
> On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <ghelmling@gmail.com> wrote:
>> Some comments on the proposal and differentiation vs HBase:
>>
>> Access Labels:
>>
>> The proposal claims that this is "unlikely to be adopted [in HBase]".  This
>> is completely untrue.  This has been discussed many times in the past in
>> relation to our security implementation.  It's just been deferred at the
>> moment due to a need to focus on the initial implementation.  But it's
>> certainly viewed as a potentially important feature for a future iteration.
>> Contributions always welcome!
>>
>> see HBASE-3435: Provide per-column-qualifier and per-key-value security for
>> HBASE-3025
>>
>>
>> Iterators:
>>
>> What do these provide that RegionObservers don't?  I'm speculating since the
>> proposal provides little in the way of details, but if these are "unlikely
>> to be adopted" it's only because coprocessors already offer more extensive
>> functionality.
>>
>>
>> "Flexibility" aka online schema changes and locality groups
>>
>> Locality groups seem to be the only meaningful differentiation in this
>> entire comparison.
>>
>>
>> Testing
>>
>> Performance under "some configurations and conditions" and unsubstantiated
>> "greater data integrity" is not meaningful differentiation.
>>
>>
>> Apache Brand
>>
>> Claims a relationship with HBase.  Is there overlapping code or is this just
>> the duplication of functionality?  There's no community relationship that
>> I'm aware of.  I haven't seen any of the proposed committers on the HBase
>> user and dev lists to this point, so that doesn't set much of a precedent
>> for community interaction.
>>
>>
>> Overall I see no meaningful differentiation vs HBase as an existing project,
>> no past attempts to interact with the most relevant Apache community, and
>> only an, until now, private "community" of government users.  I think it's
>> great that they want to open source this.  I don't want to discourage that
>> -- go for it!  But I don't see what the benefit is of ASF incubating this.
>> I only see the potential for community fragmentation and market confusion
>> over such closely similar projects.
>>
>>
>> Gary
>>
>>
>> On Fri, Sep 2, 2011 at 11:06 AM, Stack <stack@duboce.net> wrote:
>>
>>> See here for the incubator proposal:
>>> http://wiki.apache.org/incubator/AccumuloProposal
>>>
>>> Reactions probably better belong over on the incubator mailing list
>>> but I thought a discussion here first might be useful developing a
>>> stance.
>>>
>>> Initial reaction, not having seen the code, is that it seems to be close to
>>> HBase; so close, they call HBase out explicitly in their proposal.
>>>
>>> The cell based 'access labels' seem like a matter of adding
>>> an extra field to KV and their Iterators seem like a specialization on
>>> Coprocessors.  The ability to add column families on the fly seems too
>>> minor a difference to call out especially if online schema edits are
>>> now (soon) supported.  They talk of locality group like functionality
>>> too -- that
>>> could be a significant difference.  We would have to see the code but at
>>> first blush, differences look small.
>>>
>>> Yet another BT implementation further divides this contended space.
>>> If there were to be an effort integrating HBase into Accumulo or vice
>>> versa, its likely to distract significantly from project forward motion (If
>>> the Accumulo fellows were interested in integrating the two projects,
>>> I'd have thought they'd have tried to talk to us before this so thats
>>> probably not their intent).
>>>
>>> On other hand, if their once-secret project is out in the open, we can
>>> steal the Apache-licensed good bits and....
>>>
>>> What do folks think?
>>>
>>> St.Ack
>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message