accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: How does Accumulo compare to HBase
Date Mon, 23 Jun 2014 19:24:03 GMT
A few observations I can make from watching both communities (although 
only really participating in Accumulo's).

- HBase undeniably has a much larger public community of both users and 
developers; however, we are seeing broader adoption across different 
vertical markets with Accumulo. IMO, I think we have a rather responsive 
community built up here. Lots of smart people are working that are 
available and happy to help with problems.

- BatchScanner: The BatchScanner is a query construct which will 
automatically fetch data from a collection of Ranges on a table and 
return the results in the form of a Java Iterator. This makes for a very 
natural way to read lots of data from Accumulo, automatically performing 
some reduction in the data server-side (using Accumulo Iterators), and 
getting a wonderfully simple Iterator<Entry<Key,Value>> in your client 
code. It really helps to encourage a state-less and functional-like 
style to your code.

I really like it, and, when combined with the ability to push a bunch of 
work server-side, it has often kept me from having to write MapReduce 
jobs (which is always a win to me).

- Accumulo Iterators are a common thing you might hear as a difference. 
AFAICT, they're a bit more powerful than what you can do with HBase 
filters because you are presented with a stream of Key-Value pairs 
inside of the TServer. Again, it's a bit functional programming 
inspired. You have the ability to combine, consume, seek within the 
stream and do what you please (more context would be helpful in giving 
specific examples)

That being said, Iterators do come with a learning curve, but that's to 
be expected with the amount of flexibility they provide. It's just like 
anything else :)

- <disclaimer>I can't comment about running HBase in production 
environments, but I tend to hear a lot of "war stories" about it. I also 
don't know how much of this is from running old version of HBase which 
don't have known issues patched. </disclaimer>

In my experience, Accumulo just works. It doesn't require much 
day-to-day interaction, processes stay running and if some node goes 
haywire, I have absolutely no qualms against `kill -9`'ing it and 
knowing that everything will come back fine.

My $0.02.

- Josh

On 6/23/14, 2:49 PM, Josh Elser wrote:
> Another way you could word this is that Accumulo has a very "mature"
> security implementation, whereas, like you pointed out, HBase has only
> recently added this in 0.98.
> The note about how visibility being in the Key as opposed to the Value
> also has impact when writing Iterators. Because the visibility is a
> "first class citizen" instead of an afterthought, having it uniquely
> define some pair makes aggregations much easier to think about, IMO.
> This is especially prevalent when doing this server-side with an
> Accumulo Iterator.
> There are also other differences between the implementations visibility
> filtering, the most common being the support of a "NOT" operator in
> HBase whereas Accumulo explicitly chose not to implement this. By
> allowing "NOT" into the syntax, it becomes much more possible that data
> is inadvertently leaked. Marking data correctly is more difficult than
> it seems and introducing the ability to negate certain branches makes it
> even more difficult. Auditors are scary :)
> - Josh
> On 6/23/14, 2:34 PM, Aaron wrote:
>> I'm not sure of all the differences, but, wrt HBase Cell Level security
>> (CLS)..while similar..not 100% the same.  If I understand how the HBase
>> CLS works it's extension to ACL system.  And that ACL is "applied" to a
>> cell.  In Accumulo's case, it is part of the key.  So the ramification
>> is that in Accumulo, you can have:
>> RowID, CF, CQ, VIS1, TS --> Value1
>> RowID, CF, CQ, VIS2, TS --> Value2
>> If everything is the same, including the timestamp, the visibility can
>> actually determine which value to return.  So, a more concrete example
>> would be:
>> XXX, METADATA, NAME, everyone,  100--> Bruce Wayne
>> XXX, METADATA, NAME, alfred-only,  100--> Batman
>> Where Alfred could/would see both "values"...but, everyone else would
>> only see "Bruce"
>> Hope that helps.
>> Cheers,
>> Aaron
>> PS:  this is my understanding of how HBase CLS works...based on what I
>> have read/interpreted.
>> On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang <
>> <>> wrote:
>>     Er... basically I need to explain to my manager why choosing
>>     Accumulo, instead of HBase.
>>     So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98
>>     also got cell-level security, modeled after Accumulo)
>>     --
>>     Jianshi Huang
>>     LinkedIn: jianshi
>>     Twitter: @jshuang
>>     Github & Blog:

View raw message