accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: How does Accumulo compare to HBase
Date Tue, 24 Jun 2014 14:10:37 GMT
Did you get a chance to review http://securegraph.org/? SecureGraph is an
API to manipulate graphs, similar to Blueprints. Unlike Blueprints, every
Secure graph method requires authorizations and visibilities. SecureGraph
also supports multivalued properties as well as property metadata.


On Tue, Jun 24, 2014 at 9:51 AM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Wow, so many replies and very educational. Thank you all!
>
> I'm working on a Graph backend that I hope the same infrastructure can
> support
>
> 1) interactive graph exploration and queries
>
> Answering what are the interactions among N users from time A to time B,
> and how are users connected (now and before).
>
> 2) real-time (<100ms) feature calculation (aggregation, matching) in a
> network of accounts
>
> Answering questions like: what's the ratio of newly registered accounts in
> my 'connected' (need flexible definition) network, how fast does it change;
> Does the network has path satisfying A(CN) -> B(IT) -> C(US) where the age
> of path is less than 3 days; etc.
>
> 3) offline simulation of events or offline calculation of new features
> (used for building models), so I need to take snapshots and also save
> point-in-time data
>
> Having them all-in-one in the same infrastructure will greatly simplify
> the implementation.
>
> BTW, I'm working for PayPal, Risk Data Science. (All questions above are
> fake and are not related to PayPal :)
>
> I made a prototype in the last two weeks for purpose 1) and my feeling
> about Accumulo is exactly what many of you has said: it just works! Very
> little admin work, Clean and clear documentation and APIs. One thing I
> haven't got right was high-speed ingestion, I only got 100K rows/sec/node,
> but it's already very satisfying. :)
>
> BTW, from Mike's slides it seems HBase is much faster in read throughput
> if the number of columns is small. Any comments? What about latency? Can I
> cache all data in memory in Accumulo to reduce latency for cold data (say I
> just restarted my cluster)?
>
>
> Jianshi
>
>
>
>
> On Tue, Jun 24, 2014 at 10:41 AM, William Slacum <
> wilhelm.von.cloud@accumulo.net> wrote:
>
>> I think first and foremost, how has writing your application been? Is it
>> something you can easily onboard other people for? Does it seem stable
>> enough? If you can answer those questions positively, I think you have a
>> winning situation.
>>
>> The big three Hadoop vendors (Cloudera, Hortonworks and MapR) all provide
>> some level of support for Accumulo, so it has the pedigree of other members
>> of the Hadoop ecosystem.
>>
>> Regarding the performance, I think Mike's presentation needs some
>> context. He can definitely provide more context than the rest of us (and
>> possibly Sean or Bill |-|), but I think one thing he was driving home is
>> that out of the box, Accumulo is configured to run on someone's laptop.
>> There are adjustments to be made when running at any scale greater than a
>> dev machine and they may not be documented clearly.
>>
>>
>> On Mon, Jun 23, 2014 at 8:16 PM, Tejinder S Luthra <tsluthra@us.ibm.com>
>> wrote:
>>
>>> Mike did a pretty good presentation on performance comparison between
>>> Accumulo / HBase. Again not official IMO but is pretty detailed in the
>>> approach take and apples-apples comparison
>>> http://www.slideshare.net/AccumuloSummit/10-30-drob
>>>
>>>
>>>
>>> [image: Inactive hide details for Jeremy Kepner ---06/23/2014 07:42:57
>>> PM---Performance is probably the largest difference between Accu]Jeremy
>>> Kepner ---06/23/2014 07:42:57 PM---Performance is probably the largest
>>> difference between Accumulo and HBase. Accumulo can ingest/scan
>>>
>>> From: Jeremy Kepner <kepner@ll.mit.edu>
>>> To: <user@accumulo.apache.org>
>>> Date: 06/23/2014 07:42 PM
>>> Subject: Re: How does Accumulo compare to HBase
>>> ------------------------------
>>>
>>>
>>>
>>> Performance is probably the largest difference between Accumulo and
>>> HBase.
>>>
>>> Accumulo can ingest/scan at a rate of 800K entries/sec/node.
>>> This performance scales well into the hundreds of nodes to deliver
>>> 100M+ entries/sec.
>>>
>>> There are no recent HBase benchmarks and none in the peer-reviewed
>>> literature.
>>> Old data suggests that HBase performance is ~1% of Accumulo performance.
>>>
>>> In short, one can often replace a 20+ node database with
>>> a single node Accumulo database.
>>>
>>> On Tue, Jun 24, 2014 at 01:55:54AM +0800, Jianshi Huang wrote:
>>> > Er... basically I need to explain to my manager why choosing Accumulo,
>>> > instead of HBase.
>>> >
>>> > So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98
>>> also
>>> > got cell-level security, modeled after Accumulo)
>>> >
>>> > --
>>> > Jianshi Huang
>>> >
>>> > LinkedIn: jianshi
>>> > Twitter: @jshuang
>>> > Github & Blog: http://huangjs.github.com/
>>>
>>>
>>>
>>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
View raw message