hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal
Date Fri, 02 Sep 2011 19:37:24 GMT
Thanks for the update Joey.
May someone close to NSA disclose what may have changed recently that allows
contributing to Open Source eaiser ?

On Fri, Sep 2, 2011 at 12:30 PM, Joey Echeverria <joey@cloudera.com> wrote:

> To add to what Todd said, I actually worked with those guys for the
> last 3 years and have used Accumulo in production. It's true that it
> would have been better if they had been able to contribute to HBase
> rather than go on their own, but it's not easy to contribute to open
> source, either officially or unofficially when you work at NSA. I
> think there is precedence for competing and/or "duplicate" Apache
> projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly
> interested in this project setting a precedent for other work at NSA
> to be developed as open source.
>
> -Joey
>
> On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <todd@cloudera.com> wrote:
> > Hey folks,
> >
> > <wearing my Todd hat and not my Cloudera hat!>
> >
> > I've been in touch with this team for the last 18 months or so.
> > They're good people, smart, and have a healthy respect for HBase and
> > our team. Though they haven't contributed code or participated on the
> > lists, I can vouch that they do follow our development and generally
> > do understand HBase as well as what makes their system different. In
> > the context of the incubator proposal, they're trying to explain why
> > their system is different than HBase, and not trying to knock our
> > project. They do borrow our ideas, and in the future we'll be able to
> > borrow some of theirs. Iterator trees, for example, are distinct from
> > coprocessors and have some really nice capabilities which I'm looking
> > forward to adapting into HBase.
> >
> > There are a couple things to keep in mind about the story here:
> > - they first evaluated HBase 3 years ago. HBase at that point was not
> > usable for their application - I think several of us here remember the
> > state of HBase at the time and might have made the same decision. So,
> > they started their own project with an internal team of 5-6 people.
> > - contributing to open source from within the NSA is not easy, for
> > obvious reasons. They've jumped through many hoops to open source
> > this, and we should be thankful for that. Now that they're out in open
> > source land, I think we'll see them collaborating with us much more
> > openly.
> >
> > I for one look forward to working with these folks, and maybe merging
> > the projects some time down the road as the feature lists converge.
> >
> > -Todd
> >
> > On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <ghelmling@gmail.com>
> wrote:
> >> Some comments on the proposal and differentiation vs HBase:
> >>
> >> Access Labels:
> >>
> >> The proposal claims that this is "unlikely to be adopted [in HBase]".
>  This
> >> is completely untrue.  This has been discussed many times in the past in
> >> relation to our security implementation.  It's just been deferred at the
> >> moment due to a need to focus on the initial implementation.  But it's
> >> certainly viewed as a potentially important feature for a future
> iteration.
> >> Contributions always welcome!
> >>
> >> see HBASE-3435: Provide per-column-qualifier and per-key-value security
> for
> >> HBASE-3025
> >>
> >>
> >> Iterators:
> >>
> >> What do these provide that RegionObservers don't?  I'm speculating since
> the
> >> proposal provides little in the way of details, but if these are
> "unlikely
> >> to be adopted" it's only because coprocessors already offer more
> extensive
> >> functionality.
> >>
> >>
> >> "Flexibility" aka online schema changes and locality groups
> >>
> >> Locality groups seem to be the only meaningful differentiation in this
> >> entire comparison.
> >>
> >>
> >> Testing
> >>
> >> Performance under "some configurations and conditions" and
> unsubstantiated
> >> "greater data integrity" is not meaningful differentiation.
> >>
> >>
> >> Apache Brand
> >>
> >> Claims a relationship with HBase.  Is there overlapping code or is this
> just
> >> the duplication of functionality?  There's no community relationship
> that
> >> I'm aware of.  I haven't seen any of the proposed committers on the
> HBase
> >> user and dev lists to this point, so that doesn't set much of a
> precedent
> >> for community interaction.
> >>
> >>
> >> Overall I see no meaningful differentiation vs HBase as an existing
> project,
> >> no past attempts to interact with the most relevant Apache community,
> and
> >> only an, until now, private "community" of government users.  I think
> it's
> >> great that they want to open source this.  I don't want to discourage
> that
> >> -- go for it!  But I don't see what the benefit is of ASF incubating
> this.
> >> I only see the potential for community fragmentation and market
> confusion
> >> over such closely similar projects.
> >>
> >>
> >> Gary
> >>
> >>
> >> On Fri, Sep 2, 2011 at 11:06 AM, Stack <stack@duboce.net> wrote:
> >>
> >>> See here for the incubator proposal:
> >>> http://wiki.apache.org/incubator/AccumuloProposal
> >>>
> >>> Reactions probably better belong over on the incubator mailing list
> >>> but I thought a discussion here first might be useful developing a
> >>> stance.
> >>>
> >>> Initial reaction, not having seen the code, is that it seems to be
> close to
> >>> HBase; so close, they call HBase out explicitly in their proposal.
> >>>
> >>> The cell based 'access labels' seem like a matter of adding
> >>> an extra field to KV and their Iterators seem like a specialization on
> >>> Coprocessors.  The ability to add column families on the fly seems too
> >>> minor a difference to call out especially if online schema edits are
> >>> now (soon) supported.  They talk of locality group like functionality
> >>> too -- that
> >>> could be a significant difference.  We would have to see the code but
> at
> >>> first blush, differences look small.
> >>>
> >>> Yet another BT implementation further divides this contended space.
> >>> If there were to be an effort integrating HBase into Accumulo or vice
> >>> versa, its likely to distract significantly from project forward motion
> (If
> >>> the Accumulo fellows were interested in integrating the two projects,
> >>> I'd have thought they'd have tried to talk to us before this so thats
> >>> probably not their intent).
> >>>
> >>> On other hand, if their once-secret project is out in the open, we can
> >>> steal the Apache-licensed good bits and....
> >>>
> >>> What do folks think?
> >>>
> >>> St.Ack
> >>>
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message