accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: How does Accumulo compare to HBase
Date Mon, 23 Jun 2014 19:31:40 GMT
Sent too quickly..

- The BatchScanner is communicating to tservers in *parallel* which is 
where this really shows it strength.

- A "default" locality group. You don't have to define the locality 
groups for a table at creation time in Accumulo (or have to modify the 
table if you want to insert a new column family). Because of this, you 
have a lot more flexibility in how you structure your tables while also 
being able to take advantage of the efficient filtering you get having 
locality groups you have configured. Adding a new locality group does 
still require a compaction to re-write the data in separate files.

On 6/23/14, 3:24 PM, Josh Elser wrote:
> A few observations I can make from watching both communities (although
> only really participating in Accumulo's).
> - HBase undeniably has a much larger public community of both users and
> developers; however, we are seeing broader adoption across different
> vertical markets with Accumulo. IMO, I think we have a rather responsive
> community built up here. Lots of smart people are working that are
> available and happy to help with problems.
> - BatchScanner: The BatchScanner is a query construct which will
> automatically fetch data from a collection of Ranges on a table and
> return the results in the form of a Java Iterator. This makes for a very
> natural way to read lots of data from Accumulo, automatically performing
> some reduction in the data server-side (using Accumulo Iterators), and
> getting a wonderfully simple Iterator<Entry<Key,Value>> in your client
> code. It really helps to encourage a state-less and functional-like
> style to your code.
> I really like it, and, when combined with the ability to push a bunch of
> work server-side, it has often kept me from having to write MapReduce
> jobs (which is always a win to me).
> - Accumulo Iterators are a common thing you might hear as a difference.
> AFAICT, they're a bit more powerful than what you can do with HBase
> filters because you are presented with a stream of Key-Value pairs
> inside of the TServer. Again, it's a bit functional programming
> inspired. You have the ability to combine, consume, seek within the
> stream and do what you please (more context would be helpful in giving
> specific examples)
> That being said, Iterators do come with a learning curve, but that's to
> be expected with the amount of flexibility they provide. It's just like
> anything else :)
> - <disclaimer>I can't comment about running HBase in production
> environments, but I tend to hear a lot of "war stories" about it. I also
> don't know how much of this is from running old version of HBase which
> don't have known issues patched. </disclaimer>
> In my experience, Accumulo just works. It doesn't require much
> day-to-day interaction, processes stay running and if some node goes
> haywire, I have absolutely no qualms against `kill -9`'ing it and
> knowing that everything will come back fine.
> My $0.02.
> - Josh
> On 6/23/14, 2:49 PM, Josh Elser wrote:
>> Another way you could word this is that Accumulo has a very "mature"
>> security implementation, whereas, like you pointed out, HBase has only
>> recently added this in 0.98.
>> The note about how visibility being in the Key as opposed to the Value
>> also has impact when writing Iterators. Because the visibility is a
>> "first class citizen" instead of an afterthought, having it uniquely
>> define some pair makes aggregations much easier to think about, IMO.
>> This is especially prevalent when doing this server-side with an
>> Accumulo Iterator.
>> There are also other differences between the implementations visibility
>> filtering, the most common being the support of a "NOT" operator in
>> HBase whereas Accumulo explicitly chose not to implement this. By
>> allowing "NOT" into the syntax, it becomes much more possible that data
>> is inadvertently leaked. Marking data correctly is more difficult than
>> it seems and introducing the ability to negate certain branches makes it
>> even more difficult. Auditors are scary :)
>> - Josh
>> On 6/23/14, 2:34 PM, Aaron wrote:
>>> I'm not sure of all the differences, but, wrt HBase Cell Level security
>>> (CLS)..while similar..not 100% the same.  If I understand how the HBase
>>> CLS works it's extension to ACL system.  And that ACL is "applied" to a
>>> cell.  In Accumulo's case, it is part of the key.  So the ramification
>>> is that in Accumulo, you can have:
>>> RowID, CF, CQ, VIS1, TS --> Value1
>>> RowID, CF, CQ, VIS2, TS --> Value2
>>> If everything is the same, including the timestamp, the visibility can
>>> actually determine which value to return.  So, a more concrete example
>>> would be:
>>> XXX, METADATA, NAME, everyone,  100--> Bruce Wayne
>>> XXX, METADATA, NAME, alfred-only,  100--> Batman
>>> Where Alfred could/would see both "values"...but, everyone else would
>>> only see "Bruce"
>>> Hope that helps.
>>> Cheers,
>>> Aaron
>>> PS:  this is my understanding of how HBase CLS works...based on what I
>>> have read/interpreted.
>>> On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang <
>>> <>> wrote:
>>>     Er... basically I need to explain to my manager why choosing
>>>     Accumulo, instead of HBase.
>>>     So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98
>>>     also got cell-level security, modeled after Accumulo)
>>>     --
>>>     Jianshi Huang
>>>     LinkedIn: jianshi
>>>     Twitter: @jshuang
>>>     Github & Blog:

View raw message