accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Miner <>
Subject Re: How does Accumulo compare to HBase
Date Mon, 23 Jun 2014 20:23:22 GMT
This needs to be documented on the official blog.

On Mon, Jun 23, 2014 at 3:31 PM, Josh Elser <> wrote:

> Sent too quickly..
> - The BatchScanner is communicating to tservers in *parallel* which is
> where this really shows it strength.
> - A "default" locality group. You don't have to define the locality groups
> for a table at creation time in Accumulo (or have to modify the table if
> you want to insert a new column family). Because of this, you have a lot
> more flexibility in how you structure your tables while also being able to
> take advantage of the efficient filtering you get having locality groups
> you have configured. Adding a new locality group does still require a
> compaction to re-write the data in separate files.
> On 6/23/14, 3:24 PM, Josh Elser wrote:
>> A few observations I can make from watching both communities (although
>> only really participating in Accumulo's).
>> - HBase undeniably has a much larger public community of both users and
>> developers; however, we are seeing broader adoption across different
>> vertical markets with Accumulo. IMO, I think we have a rather responsive
>> community built up here. Lots of smart people are working that are
>> available and happy to help with problems.
>> - BatchScanner: The BatchScanner is a query construct which will
>> automatically fetch data from a collection of Ranges on a table and
>> return the results in the form of a Java Iterator. This makes for a very
>> natural way to read lots of data from Accumulo, automatically performing
>> some reduction in the data server-side (using Accumulo Iterators), and
>> getting a wonderfully simple Iterator<Entry<Key,Value>> in your client
>> code. It really helps to encourage a state-less and functional-like
>> style to your code.
>> I really like it, and, when combined with the ability to push a bunch of
>> work server-side, it has often kept me from having to write MapReduce
>> jobs (which is always a win to me).
>> - Accumulo Iterators are a common thing you might hear as a difference.
>> AFAICT, they're a bit more powerful than what you can do with HBase
>> filters because you are presented with a stream of Key-Value pairs
>> inside of the TServer. Again, it's a bit functional programming
>> inspired. You have the ability to combine, consume, seek within the
>> stream and do what you please (more context would be helpful in giving
>> specific examples)
>> That being said, Iterators do come with a learning curve, but that's to
>> be expected with the amount of flexibility they provide. It's just like
>> anything else :)
>> - <disclaimer>I can't comment about running HBase in production
>> environments, but I tend to hear a lot of "war stories" about it. I also
>> don't know how much of this is from running old version of HBase which
>> don't have known issues patched. </disclaimer>
>> In my experience, Accumulo just works. It doesn't require much
>> day-to-day interaction, processes stay running and if some node goes
>> haywire, I have absolutely no qualms against `kill -9`'ing it and
>> knowing that everything will come back fine.
>> My $0.02.
>> - Josh
>> On 6/23/14, 2:49 PM, Josh Elser wrote:
>>> Another way you could word this is that Accumulo has a very "mature"
>>> security implementation, whereas, like you pointed out, HBase has only
>>> recently added this in 0.98.
>>> The note about how visibility being in the Key as opposed to the Value
>>> also has impact when writing Iterators. Because the visibility is a
>>> "first class citizen" instead of an afterthought, having it uniquely
>>> define some pair makes aggregations much easier to think about, IMO.
>>> This is especially prevalent when doing this server-side with an
>>> Accumulo Iterator.
>>> There are also other differences between the implementations visibility
>>> filtering, the most common being the support of a "NOT" operator in
>>> HBase whereas Accumulo explicitly chose not to implement this. By
>>> allowing "NOT" into the syntax, it becomes much more possible that data
>>> is inadvertently leaked. Marking data correctly is more difficult than
>>> it seems and introducing the ability to negate certain branches makes it
>>> even more difficult. Auditors are scary :)
>>> - Josh
>>> On 6/23/14, 2:34 PM, Aaron wrote:
>>>> I'm not sure of all the differences, but, wrt HBase Cell Level security
>>>> (CLS)..while similar..not 100% the same.  If I understand how the HBase
>>>> CLS works it's extension to ACL system.  And that ACL is "applied" to a
>>>> cell.  In Accumulo's case, it is part of the key.  So the ramification
>>>> is that in Accumulo, you can have:
>>>> RowID, CF, CQ, VIS1, TS --> Value1
>>>> RowID, CF, CQ, VIS2, TS --> Value2
>>>> If everything is the same, including the timestamp, the visibility can
>>>> actually determine which value to return.  So, a more concrete example
>>>> would be:
>>>> XXX, METADATA, NAME, everyone,  100--> Bruce Wayne
>>>> XXX, METADATA, NAME, alfred-only,  100--> Batman
>>>> Where Alfred could/would see both "values"...but, everyone else would
>>>> only see "Bruce"
>>>> Hope that helps.
>>>> Cheers,
>>>> Aaron
>>>> PS:  this is my understanding of how HBase CLS works...based on what I
>>>> have read/interpreted.
>>>> On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang <
>>>> <>> wrote:
>>>>     Er... basically I need to explain to my manager why choosing
>>>>     Accumulo, instead of HBase.
>>>>     So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98
>>>>     also got cell-level security, modeled after Accumulo)
>>>>     --
>>>>     Jianshi Huang
>>>>     LinkedIn: jianshi
>>>>     Twitter: @jshuang
>>>>     Github & Blog:


Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807

View raw message