accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: How does Accumulo compare to HBase
Date Tue, 24 Jun 2014 13:51:41 GMT
Wow, so many replies and very educational. Thank you all!

I'm working on a Graph backend that I hope the same infrastructure can
support

1) interactive graph exploration and queries

Answering what are the interactions among N users from time A to time B,
and how are users connected (now and before).

2) real-time (<100ms) feature calculation (aggregation, matching) in a
network of accounts

Answering questions like: what's the ratio of newly registered accounts in
my 'connected' (need flexible definition) network, how fast does it change;
Does the network has path satisfying A(CN) -> B(IT) -> C(US) where the age
of path is less than 3 days; etc.

3) offline simulation of events or offline calculation of new features
(used for building models), so I need to take snapshots and also save
point-in-time data

Having them all-in-one in the same infrastructure will greatly simplify the
implementation.

BTW, I'm working for PayPal, Risk Data Science. (All questions above are
fake and are not related to PayPal :)

I made a prototype in the last two weeks for purpose 1) and my feeling
about Accumulo is exactly what many of you has said: it just works! Very
little admin work, Clean and clear documentation and APIs. One thing I
haven't got right was high-speed ingestion, I only got 100K rows/sec/node,
but it's already very satisfying. :)

BTW, from Mike's slides it seems HBase is much faster in read throughput if
the number of columns is small. Any comments? What about latency? Can I
cache all data in memory in Accumulo to reduce latency for cold data (say I
just restarted my cluster)?


Jianshi




On Tue, Jun 24, 2014 at 10:41 AM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> I think first and foremost, how has writing your application been? Is it
> something you can easily onboard other people for? Does it seem stable
> enough? If you can answer those questions positively, I think you have a
> winning situation.
>
> The big three Hadoop vendors (Cloudera, Hortonworks and MapR) all provide
> some level of support for Accumulo, so it has the pedigree of other members
> of the Hadoop ecosystem.
>
> Regarding the performance, I think Mike's presentation needs some context.
> He can definitely provide more context than the rest of us (and possibly
> Sean or Bill |-|), but I think one thing he was driving home is that out of
> the box, Accumulo is configured to run on someone's laptop. There are
> adjustments to be made when running at any scale greater than a dev machine
> and they may not be documented clearly.
>
>
> On Mon, Jun 23, 2014 at 8:16 PM, Tejinder S Luthra <tsluthra@us.ibm.com>
> wrote:
>
>> Mike did a pretty good presentation on performance comparison between
>> Accumulo / HBase. Again not official IMO but is pretty detailed in the
>> approach take and apples-apples comparison
>> http://www.slideshare.net/AccumuloSummit/10-30-drob
>>
>>
>>
>> [image: Inactive hide details for Jeremy Kepner ---06/23/2014 07:42:57
>> PM---Performance is probably the largest difference between Accu]Jeremy
>> Kepner ---06/23/2014 07:42:57 PM---Performance is probably the largest
>> difference between Accumulo and HBase. Accumulo can ingest/scan
>>
>> From: Jeremy Kepner <kepner@ll.mit.edu>
>> To: <user@accumulo.apache.org>
>> Date: 06/23/2014 07:42 PM
>> Subject: Re: How does Accumulo compare to HBase
>> ------------------------------
>>
>>
>>
>> Performance is probably the largest difference between Accumulo and HBase.
>>
>> Accumulo can ingest/scan at a rate of 800K entries/sec/node.
>> This performance scales well into the hundreds of nodes to deliver
>> 100M+ entries/sec.
>>
>> There are no recent HBase benchmarks and none in the peer-reviewed
>> literature.
>> Old data suggests that HBase performance is ~1% of Accumulo performance.
>>
>> In short, one can often replace a 20+ node database with
>> a single node Accumulo database.
>>
>> On Tue, Jun 24, 2014 at 01:55:54AM +0800, Jianshi Huang wrote:
>> > Er... basically I need to explain to my manager why choosing Accumulo,
>> > instead of HBase.
>> >
>> > So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98
>> also
>> > got cell-level security, modeled after Accumulo)
>> >
>> > --
>> > Jianshi Huang
>> >
>> > LinkedIn: jianshi
>> > Twitter: @jshuang
>> > Github & Blog: http://huangjs.github.com/
>>
>>
>>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
View raw message