accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vicky Kak <vicky....@gmail.com>
Subject Re: How does Accumulo compare to HBase
Date Thu, 10 Jul 2014 05:52:43 GMT
Have you got started with the blog, in case not I can spare some time
writing about it?
High ingestion speed and scan wrt to HBase is eye opener for me, I would be
interested to dig deeper into it my self.
Wondering if anyone had some information about the ingestion/scan speed
with the Cassandra?



On Tue, Jun 24, 2014 at 1:58 AM, Josh Elser <josh.elser@gmail.com> wrote:

> Noted: I'll add it to the top of my "to blog" queue. If anyone else wants
> to do a write-up, I'm happy to help.
>
>
> On 6/23/14, 4:23 PM, Donald Miner wrote:
>
>> This needs to be documented on the official blog.
>>
>>
>> On Mon, Jun 23, 2014 at 3:31 PM, Josh Elser <josh.elser@gmail.com
>> <mailto:josh.elser@gmail.com>> wrote:
>>
>>     Sent too quickly..
>>
>>     - The BatchScanner is communicating to tservers in *parallel* which
>>     is where this really shows it strength.
>>
>>     - A "default" locality group. You don't have to define the locality
>>     groups for a table at creation time in Accumulo (or have to modify
>>     the table if you want to insert a new column family). Because of
>>     this, you have a lot more flexibility in how you structure your
>>     tables while also being able to take advantage of the efficient
>>     filtering you get having locality groups you have configured. Adding
>>     a new locality group does still require a compaction to re-write the
>>     data in separate files.
>>
>>
>>     On 6/23/14, 3:24 PM, Josh Elser wrote:
>>
>>         A few observations I can make from watching both communities
>>         (although
>>         only really participating in Accumulo's).
>>
>>         - HBase undeniably has a much larger public community of both
>>         users and
>>         developers; however, we are seeing broader adoption across
>> different
>>         vertical markets with Accumulo. IMO, I think we have a rather
>>         responsive
>>         community built up here. Lots of smart people are working that are
>>         available and happy to help with problems.
>>
>>         - BatchScanner: The BatchScanner is a query construct which will
>>         automatically fetch data from a collection of Ranges on a table
>> and
>>         return the results in the form of a Java Iterator. This makes
>>         for a very
>>         natural way to read lots of data from Accumulo, automatically
>>         performing
>>         some reduction in the data server-side (using Accumulo
>>         Iterators), and
>>         getting a wonderfully simple Iterator<Entry<Key,Value>> in your
>>         client
>>         code. It really helps to encourage a state-less and
>> functional-like
>>         style to your code.
>>
>>         I really like it, and, when combined with the ability to push a
>>         bunch of
>>         work server-side, it has often kept me from having to write
>>         MapReduce
>>         jobs (which is always a win to me).
>>
>>         - Accumulo Iterators are a common thing you might hear as a
>>         difference.
>>         AFAICT, they're a bit more powerful than what you can do with
>> HBase
>>         filters because you are presented with a stream of Key-Value pairs
>>         inside of the TServer. Again, it's a bit functional programming
>>         inspired. You have the ability to combine, consume, seek within
>> the
>>         stream and do what you please (more context would be helpful in
>>         giving
>>         specific examples)
>>
>>         That being said, Iterators do come with a learning curve, but
>>         that's to
>>         be expected with the amount of flexibility they provide. It's
>>         just like
>>         anything else :)
>>
>>         - <disclaimer>I can't comment about running HBase in production
>>         environments, but I tend to hear a lot of "war stories" about
>>         it. I also
>>         don't know how much of this is from running old version of HBase
>>         which
>>         don't have known issues patched. </disclaimer>
>>
>>         In my experience, Accumulo just works. It doesn't require much
>>         day-to-day interaction, processes stay running and if some node
>> goes
>>         haywire, I have absolutely no qualms against `kill -9`'ing it and
>>         knowing that everything will come back fine.
>>
>>         My $0.02.
>>
>>         - Josh
>>
>>         On 6/23/14, 2:49 PM, Josh Elser wrote:
>>
>>             Another way you could word this is that Accumulo has a very
>>             "mature"
>>             security implementation, whereas, like you pointed out,
>>             HBase has only
>>             recently added this in 0.98.
>>
>>             The note about how visibility being in the Key as opposed to
>>             the Value
>>             also has impact when writing Iterators. Because the
>>             visibility is a
>>             "first class citizen" instead of an afterthought, having it
>>             uniquely
>>             define some pair makes aggregations much easier to think
>>             about, IMO.
>>             This is especially prevalent when doing this server-side with
>> an
>>             Accumulo Iterator.
>>
>>             There are also other differences between the implementations
>>             visibility
>>             filtering, the most common being the support of a "NOT"
>>             operator in
>>             HBase whereas Accumulo explicitly chose not to implement
>>             this. By
>>             allowing "NOT" into the syntax, it becomes much more
>>             possible that data
>>             is inadvertently leaked. Marking data correctly is more
>>             difficult than
>>             it seems and introducing the ability to negate certain
>>             branches makes it
>>             even more difficult. Auditors are scary :)
>>
>>             - Josh
>>
>>             On 6/23/14, 2:34 PM, Aaron wrote:
>>
>>                 I'm not sure of all the differences, but, wrt HBase Cell
>>                 Level security
>>                 (CLS)..while similar..not 100% the same.  If I
>>                 understand how the HBase
>>                 CLS works it's extension to ACL system.  And that ACL is
>>                 "applied" to a
>>                 cell.  In Accumulo's case, it is part of the key.  So
>>                 the ramification
>>                 is that in Accumulo, you can have:
>>
>>                 RowID, CF, CQ, VIS1, TS --> Value1
>>                 RowID, CF, CQ, VIS2, TS --> Value2
>>
>>                 If everything is the same, including the timestamp, the
>>                 visibility can
>>                 actually determine which value to return.  So, a more
>>                 concrete example
>>                 would be:
>>
>>                 XXX, METADATA, NAME, everyone,  100--> Bruce Wayne
>>                 XXX, METADATA, NAME, alfred-only,  100--> Batman
>>
>>                 Where Alfred could/would see both "values"...but,
>>                 everyone else would
>>                 only see "Bruce"
>>
>>                 Hope that helps.
>>
>>                 Cheers,
>>                 Aaron
>>
>>                 PS:  this is my understanding of how HBase CLS
>>                 works...based on what I
>>                 have read/interpreted.
>>
>>
>>
>>                 On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang
>>                 <jianshi.huang@gmail.com <mailto:jianshi.huang@gmail.com>
>>                 <mailto:jianshi.huang@gmail.__com
>>
>>                 <mailto:jianshi.huang@gmail.com>>> wrote:
>>
>>                      Er... basically I need to explain to my manager why
>>                 choosing
>>                      Accumulo, instead of HBase.
>>
>>                      So what are the pros and cons of Accumulo vs.
>>                 HBase? (btw HBase 0.98
>>                      also got cell-level security, modeled after Accumulo)
>>
>>                      --
>>                      Jianshi Huang
>>
>>                      LinkedIn: jianshi
>>                      Twitter: @jshuang
>>                      Github & Blog: http://huangjs.github.com/
>>
>>
>>
>>
>>
>> --
>> *
>> *Donald Miner
>>
>> Chief Technology Officer
>> ClearEdge IT Solutions, LLC
>> Cell: 443 799 7807
>> www.clearedgeit.com <http://www.clearedgeit.com>
>>
>

Mime
View raw message