hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal
Date Sat, 03 Sep 2011 02:06:17 GMT
> I think there is precedence for competing and/or "duplicate" Apache
> projects, Avro/Thrift and HBase/Cassandra come to mind. 

 
That argument isn't helping you make your case.


Best regards,


       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Joey Echeverria <joey@cloudera.com>
>To: dev@hbase.apache.org
>Sent: Saturday, September 3, 2011 3:30 AM
>Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator
as a proposal
>
>To add to what Todd said, I actually worked with those guys for the
>last 3 years and have used Accumulo in production. It's true that it
>would have been better if they had been able to contribute to HBase
>rather than go on their own, but it's not easy to contribute to open
>source, either officially or unofficially when you work at NSA. I
>think there is precedence for competing and/or "duplicate" Apache
>projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly
>interested in this project setting a precedent for other work at NSA
>to be developed as open source.
>
>-Joey
>
>On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <todd@cloudera.com> wrote:
>> Hey folks,
>>
>> <wearing my Todd hat and not my Cloudera hat!>
>>
>> I've been in touch with this team for the last 18 months or so.
>> They're good people, smart, and have a healthy respect for HBase and
>> our team. Though they haven't contributed code or participated on the
>> lists, I can vouch that they do follow our development and generally
>> do understand HBase as well as what makes their system different. In
>> the context of the incubator proposal, they're trying to explain why
>> their system is different than HBase, and not trying to knock our
>> project. They do borrow our ideas, and in the future we'll be able to
>> borrow some of theirs. Iterator trees, for example, are distinct from
>> coprocessors and have some really nice capabilities which I'm looking
>> forward to adapting into HBase.
>>
>> There are a couple things to keep in mind about the story here:
>> - they first evaluated HBase 3 years ago. HBase at that point was not
>> usable for their application - I think several of us here remember the
>> state of HBase at the time and might have made the same decision. So,
>> they started their own project with an internal team of 5-6 people.
>> - contributing to open source from within the NSA is not easy, for
>> obvious reasons. They've jumped through many hoops to open source
>> this, and we should be thankful for that. Now that they're out in open
>> source land, I think we'll see them collaborating with us much more
>> openly.
>>
>> I for one look forward to working with these folks, and maybe merging
>> the projects some time down the road as the feature lists converge.
>>
>> -Todd
>>
>> On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <ghelmling@gmail.com> wrote:
>>> Some comments on the proposal and differentiation vs HBase:
>>>
>>> Access Labels:
>>>
>>> The proposal claims that this is "unlikely to be adopted [in HBase]".  This
>>> is completely untrue.  This has been discussed many times in the past in
>>> relation to our security implementation.  It's just been deferred at the
>>> moment due to a need to focus on the initial implementation.  But it's
>>> certainly viewed as a potentially important feature for a future iteration.
>>> Contributions always welcome!
>>>
>>> see HBASE-3435: Provide per-column-qualifier and per-key-value security for
>>> HBASE-3025
>>>
>>>
>>> Iterators:
>>>
>>> What do these provide that RegionObservers don't?  I'm speculating since the
>>> proposal provides little in the way of details, but if these are "unlikely
>>> to be adopted" it's only because coprocessors already offer more extensive
>>> functionality.
>>>
>>>
>>> "Flexibility" aka online schema changes and locality groups
>>>
>>> Locality groups seem to be the only meaningful differentiation in this
>>> entire comparison.
>>>
>>>
>>> Testing
>>>
>>> Performance under "some configurations and conditions" and unsubstantiated
>>> "greater data integrity" is not meaningful differentiation.
>>>
>>>
>>> Apache Brand
>>>
>>> Claims a relationship with HBase.  Is there overlapping code or is this just
>>> the duplication of functionality?  There's no community relationship that
>>> I'm aware of.  I haven't seen any of the proposed committers on the HBase
>>> user and dev lists to this point, so that doesn't set much of a precedent
>>> for community interaction.
>>>
>>>
>>> Overall I see no meaningful differentiation vs HBase as an existing project,
>>> no past attempts to interact with the most relevant Apache community, and
>>> only an, until now, private "community" of government users.  I think it's
>>> great that they want to open source this.  I don't want to discourage that
>>> -- go for it!  But I don't see what the benefit is of ASF incubating this.
>>> I only see the potential for community fragmentation and market confusion
>>> over such closely similar projects.
>>>
>>>
>>> Gary
>>>
>>>
>>> On Fri, Sep 2, 2011 at 11:06 AM, Stack <stack@duboce.net> wrote:
>>>
>>>> See here for the incubator proposal:
>>>> http://wiki.apache.org/incubator/AccumuloProposal
>>>>
>>>> Reactions probably better belong over on the incubator mailing list
>>>> but I thought a discussion here first might be useful developing a
>>>> stance.
>>>>
>>>> Initial reaction, not having seen the code, is that it seems to be close
to
>>>> HBase; so close, they call HBase out explicitly in their proposal.
>>>>
>>>> The cell based 'access labels' seem like a matter of adding
>>>> an extra field to KV and their Iterators seem like a specialization on
>>>> Coprocessors.  The ability to add column families on the fly seems too
>>>> minor a difference to call out especially if online schema edits are
>>>> now (soon) supported.  They talk of locality group like functionality
>>>> too -- that
>>>> could be a significant difference.  We would have to see the code but at
>>>> first blush, differences look small.
>>>>
>>>> Yet another BT implementation further divides this contended space.
>>>> If there were to be an effort integrating HBase into Accumulo or vice
>>>> versa, its likely to distract significantly from project forward motion (If
>>>> the Accumulo fellows were interested in integrating the two projects,
>>>> I'd have thought they'd have tried to talk to us before this so thats
>>>> probably not their intent).
>>>>
>>>> On other hand, if their once-secret project is out in the open, we can
>>>> steal the Apache-licensed good bits and....
>>>>
>>>> What do folks think?
>>>>
>>>> St.Ack
>>>>
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
>-- 
>Joseph Echeverria
>Cloudera, Inc.
>443.305.9434
>
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message