hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal
Date Fri, 09 Sep 2011 20:28:45 GMT
Accepting Accumulo into the incubator would be a good encouragement for the
folks at NSA to work more with open source software and engage with the
communities and set a good example for future projects. That, in my mind,
seems to be the strongest reason for letting the project in. However, I
don't see how that helps HBase or ASF in the long run. It is true that it'll
take time and effort to combine the projects right now but that might be a
hit worth taking and having combined development efforts from here on as
compared to having two completely independent project and later on looking
at the merge. I don't see how a merge later on will be any easier than right
now. The decision obviously comes down to how much effort the developers on
both projects are willing to put into it right now or later on.

Having said that, I think the HBase community at large needs to get an
insight into Accumulo's implementation to gauge how different the two
projects are in terms of the implementation details and code. Trying to come
to a conclusion without doing that might not give us the best solution. I'm
excited about the fact that we have an alternate implementation but that's
just the engineer in me. The HBase user in me is worried about the confusion
an almost ditto alternate project will create.

Just my $0.02.

-ak

On Fri, Sep 9, 2011 at 1:50 PM, Andrew Purtell <apurtell@apache.org> wrote:

> > From: Duane Moore <duane.moore@issinc.com>
>
> > I will second what Todd and Joey
> > said and reiterate that contributing to open source is not easy for a
> > government contractor, and especially not easy for U.S. government
> > employees.
>
>
> This is true as a general statement I'm sure.
>
> However, my former life was as an engineer in a DARPA shop with a TS
> clearance. During that time I worked on both closed/classified systems and
> projects such as TrustedBSD (http://www.trustedbsd.org/). Choosing to
> develop an internal alternative rather than work with the HBase project was
> a decision of convenience by someone.
>
> While all appreciate this eventual open sourcing on some level, the outcome
> is hardly optimal, and does not favor in my opinion the existing open source
> community here (HBase) in the short term, and any long term favor is going
> to require work by that community.
>
> > My personal preference for a long while has been to migrate
> > our Accumulo implementation to HBase, but as with any project there are
> > often non-technical considerations for doing so.
>
>
> I can only hope that open source communities in general will apply a
> penalty for taking the easy way out for such non-technical considerations.
> We do not have to act as beggars. Presumably this open sourcing was not done
> out of charity -- I would be quite surprised, maybe shocked. If government
> (or contractors) want to leverage open source communities for some benefit,
> the least we can do is insist on respectful terms.
>
> Best regards,
>
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
> ----- Original Message -----
> > From: Duane Moore <duane.moore@issinc.com>
> > To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> > Cc:
> > Sent: Tuesday, September 6, 2011 9:21 AM
> > Subject: Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up
> on Apache Incubator as a proposal
> >
> > Hello all,
> >
> > I've been a lurker on the HBase list for a year or so and our company has
> > also been working with the Accumulo implementation during the same time
> > frame.  I'd like to respond to Stack's suggestion to focus on the
> > technical merits of the proposal.  Since I have some info on the pre-open
> > sourced version of Accumulo, I'd like to share some of our evaluation of
> > the software, primarily from a client perspective (vs. implementation
> > details like logging to NFS vs HDFS).
> >
> > First, I share many of the same concerns of folks who were frustrated
> that
> > this project seems to duplicate the effort of the open source
> > (particularly HBase) community.  However, I will second what Todd and
> Joey
> > said and reiterate that contributing to open source is not easy for a
> > government contractor, and especially not easy for U.S. government
> > employees.  My personal preference for a long while has been to migrate
> > our Accumulo implementation to HBase, but as with any project there are
> > often non-technical considerations for doing so.
> >
> > Below are some notes we took last year on the differences between
> Accumulo
> > and HBase, with additional notes from me inline.  Much of this mirrors
> > what is in the current Accumulo proposal.
> >
> > -----
> >
> > - Column Families
> > In HBase you must specify all column families up front as part of the
> > table schema declaration when creating a table.
> > Accumulo does not have this restriction, you do not declare column
> > families when you create a table. When you insert a new row into the
> table
> > you can just provide a new column family.
> > ** Note: sounds like from what Stack said, this is close to being OBE?
> >
> >
> > - Aggregation
> > Accumulo offers the ability to specify an aggregator for an individual
> > column family or column. This allows you to keep a row count, or
> summation
> > of numerical values that may be stored in a particular column. It would
> > appear the function has to operate on the subset of values stored for
> that
> > column in the table at a particular time since it keeps the aggregate
> > value in memory. So this may not be able to handle certain aggregation
> > functions like 'median' for instance. But functions like sum, max, min,
> > mean, and count should all be supportable.
> > I could not find a comparable feature within HBase, but HBase does offer
> > an atomic function called incremementColumnValue on the HTable class
> which
> > appears can be leveraged to provide aggregation behavior.
> >
> >
> > - Column Visibility
> > This is the feature in Accumulo that allows tagging of the data at the
> > column level, which would primarily be used for classification markings
> > (in our scenario).
> > If we were to implement the same type of column visibility in HBase that
> > Accumulo supports, we would have potentially several options:
> > -Try to implement column visibility as a patch to HBase. Would be fun,
> but
> > may be a lot of work.
> > -Since the value of a particular column (cell, actually) is simply a byte
> > array, we could utilize a standard technique of encoding the visibility
> > level/classification in the column value itself.
> > -Since the number of columns is not pre-defined, adopt a convention
> > whereby each column "foo" gets an additional column added by our
> > infrastructure called "foo_visibility".
> > ** Note: We have a requirement to use PKI (digital certificates) for
> > authentication in our service stack. The relationship between PKI and
> > Kerberos currently used for Secure HBase is interesting; not quite sure
> > how the two would fit together in practice.
> >
> > -Retrieving Data
> > Accumulo uses a Scanner object for all retrieval operations, which are
> > instantiated by retrieving a Scanner from the Connector object. When
> > retrieving all values for a particular row, the _individual cells are
> > returned as a new entry_ returned by the Scanner iterator.
> > In HBase, you can use a Scan object (org.apache.hadoop.hbase.client.Scan)
> > or you can use a Get object, which allows you to retrieve a single row at
> > a time. In either case, the org.apache.hadoop.hbase.client.Result class
> is
> > returned, representing all of the requested data for that particular row.
> > In HBase, to set constraints on a query, you set a
> > org.apache.hadoop.hbase.filter.Filter object on the Scan object. Multiple
> > Filters may be set by using the FilterList object. In Accumulo, you call
> > the setScanIterators() method on the Scanner object, which enables the
> > appropriate iterators for use on the server before returning data.
> > ** Note: primary difference here is in the use of server-side iterators,
> > which Andy has correctly pointed out could be implemented via the
> > coprocessor framework.  We did some initial investigation into
> > coprocessors to see if we could implement this equivalent functionality,
> > but since we'd been directed to use Accumulo, we didn't have much
> > bandwidth to address this (also coprocessors were in their infancy at the
> > time).
> >
> >
> >
> > -----
> >
> >
> > Hope that helps.  Bottom line is that I believe that the features in
> > Accumulo can and ought to be merged into HBase at some point (assuming
> the
> > technical merits hold up).  Looking forward to contributing to that
> > conversation.
> >
> > Thanks,
> > Duane
> >
> > On 9/3/11 2:21 PM, "Stack" <stack@duboce.net> wrote:
> >
> >>
> >> I'd suggest we refocus this thread on how to respond to the Accumulo
> >> proposal (or whether to respond at all), since thats what we 'know'.
> >> I think it'd be useful correcting at least the 'unlikely tos'
> > with
> >> pointers to committed code.
> >>
> >> Code overlap, if any, can be addressed when the code drop happens.
> >>
> >> St.Ack
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message