accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: How does Accumulo compare to HBase
Date Mon, 23 Jun 2014 20:28:35 GMT
Noted: I'll add it to the top of my "to blog" queue. If anyone else 
wants to do a write-up, I'm happy to help.

On 6/23/14, 4:23 PM, Donald Miner wrote:
> This needs to be documented on the official blog.
>
>
> On Mon, Jun 23, 2014 at 3:31 PM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     Sent too quickly..
>
>     - The BatchScanner is communicating to tservers in *parallel* which
>     is where this really shows it strength.
>
>     - A "default" locality group. You don't have to define the locality
>     groups for a table at creation time in Accumulo (or have to modify
>     the table if you want to insert a new column family). Because of
>     this, you have a lot more flexibility in how you structure your
>     tables while also being able to take advantage of the efficient
>     filtering you get having locality groups you have configured. Adding
>     a new locality group does still require a compaction to re-write the
>     data in separate files.
>
>
>     On 6/23/14, 3:24 PM, Josh Elser wrote:
>
>         A few observations I can make from watching both communities
>         (although
>         only really participating in Accumulo's).
>
>         - HBase undeniably has a much larger public community of both
>         users and
>         developers; however, we are seeing broader adoption across different
>         vertical markets with Accumulo. IMO, I think we have a rather
>         responsive
>         community built up here. Lots of smart people are working that are
>         available and happy to help with problems.
>
>         - BatchScanner: The BatchScanner is a query construct which will
>         automatically fetch data from a collection of Ranges on a table and
>         return the results in the form of a Java Iterator. This makes
>         for a very
>         natural way to read lots of data from Accumulo, automatically
>         performing
>         some reduction in the data server-side (using Accumulo
>         Iterators), and
>         getting a wonderfully simple Iterator<Entry<Key,Value>> in your
>         client
>         code. It really helps to encourage a state-less and functional-like
>         style to your code.
>
>         I really like it, and, when combined with the ability to push a
>         bunch of
>         work server-side, it has often kept me from having to write
>         MapReduce
>         jobs (which is always a win to me).
>
>         - Accumulo Iterators are a common thing you might hear as a
>         difference.
>         AFAICT, they're a bit more powerful than what you can do with HBase
>         filters because you are presented with a stream of Key-Value pairs
>         inside of the TServer. Again, it's a bit functional programming
>         inspired. You have the ability to combine, consume, seek within the
>         stream and do what you please (more context would be helpful in
>         giving
>         specific examples)
>
>         That being said, Iterators do come with a learning curve, but
>         that's to
>         be expected with the amount of flexibility they provide. It's
>         just like
>         anything else :)
>
>         - <disclaimer>I can't comment about running HBase in production
>         environments, but I tend to hear a lot of "war stories" about
>         it. I also
>         don't know how much of this is from running old version of HBase
>         which
>         don't have known issues patched. </disclaimer>
>
>         In my experience, Accumulo just works. It doesn't require much
>         day-to-day interaction, processes stay running and if some node goes
>         haywire, I have absolutely no qualms against `kill -9`'ing it and
>         knowing that everything will come back fine.
>
>         My $0.02.
>
>         - Josh
>
>         On 6/23/14, 2:49 PM, Josh Elser wrote:
>
>             Another way you could word this is that Accumulo has a very
>             "mature"
>             security implementation, whereas, like you pointed out,
>             HBase has only
>             recently added this in 0.98.
>
>             The note about how visibility being in the Key as opposed to
>             the Value
>             also has impact when writing Iterators. Because the
>             visibility is a
>             "first class citizen" instead of an afterthought, having it
>             uniquely
>             define some pair makes aggregations much easier to think
>             about, IMO.
>             This is especially prevalent when doing this server-side with an
>             Accumulo Iterator.
>
>             There are also other differences between the implementations
>             visibility
>             filtering, the most common being the support of a "NOT"
>             operator in
>             HBase whereas Accumulo explicitly chose not to implement
>             this. By
>             allowing "NOT" into the syntax, it becomes much more
>             possible that data
>             is inadvertently leaked. Marking data correctly is more
>             difficult than
>             it seems and introducing the ability to negate certain
>             branches makes it
>             even more difficult. Auditors are scary :)
>
>             - Josh
>
>             On 6/23/14, 2:34 PM, Aaron wrote:
>
>                 I'm not sure of all the differences, but, wrt HBase Cell
>                 Level security
>                 (CLS)..while similar..not 100% the same.  If I
>                 understand how the HBase
>                 CLS works it's extension to ACL system.  And that ACL is
>                 "applied" to a
>                 cell.  In Accumulo's case, it is part of the key.  So
>                 the ramification
>                 is that in Accumulo, you can have:
>
>                 RowID, CF, CQ, VIS1, TS --> Value1
>                 RowID, CF, CQ, VIS2, TS --> Value2
>
>                 If everything is the same, including the timestamp, the
>                 visibility can
>                 actually determine which value to return.  So, a more
>                 concrete example
>                 would be:
>
>                 XXX, METADATA, NAME, everyone,  100--> Bruce Wayne
>                 XXX, METADATA, NAME, alfred-only,  100--> Batman
>
>                 Where Alfred could/would see both "values"...but,
>                 everyone else would
>                 only see "Bruce"
>
>                 Hope that helps.
>
>                 Cheers,
>                 Aaron
>
>                 PS:  this is my understanding of how HBase CLS
>                 works...based on what I
>                 have read/interpreted.
>
>
>
>                 On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang
>                 <jianshi.huang@gmail.com <mailto:jianshi.huang@gmail.com>
>                 <mailto:jianshi.huang@gmail.__com
>                 <mailto:jianshi.huang@gmail.com>>> wrote:
>
>                      Er... basically I need to explain to my manager why
>                 choosing
>                      Accumulo, instead of HBase.
>
>                      So what are the pros and cons of Accumulo vs.
>                 HBase? (btw HBase 0.98
>                      also got cell-level security, modeled after Accumulo)
>
>                      --
>                      Jianshi Huang
>
>                      LinkedIn: jianshi
>                      Twitter: @jshuang
>                      Github & Blog: http://huangjs.github.com/
>
>
>
>
>
> --
> *
> *Donald Miner
> Chief Technology Officer
> ClearEdge IT Solutions, LLC
> Cell: 443 799 7807
> www.clearedgeit.com <http://www.clearedgeit.com>

Mime
View raw message