accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: How does Accumulo compare to HBase
Date Tue, 24 Jun 2014 14:33:59 GMT
Jianshi:
How many column families and columns are you expecting (maximum) in your
largest table ?

Cheers


On Tue, Jun 24, 2014 at 7:29 AM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Hi David,
>
> I did, it's a wonderful piece of work and for reviewing facts in a
> networks it's a great tool. (And Lumify looks really nice)
>
> However, my queries are mostly time-bound (from time A to time B), and to
> make some query real-time (< 50ms), I have to roll out my own schema and
> index, to denormalize properties and to incrementally do aggregations. I
> don't think there're existing solution in Graph database that can do these.
>
> And it's really fun to implement it myself. :)
>
> Please correct me if I'm wrong
>
> Jianshi
>
>
>
> On Tue, Jun 24, 2014 at 10:10 PM, David Medinets <david.medinets@gmail.com
> > wrote:
>
>> Did you get a chance to review http://securegraph.org/? SecureGraph is
>> an API to manipulate graphs, similar to Blueprints. Unlike Blueprints,
>> every Secure graph method requires authorizations and visibilities.
>> SecureGraph also supports multivalued properties as well as property
>> metadata.
>>
>>
>> On Tue, Jun 24, 2014 at 9:51 AM, Jianshi Huang <jianshi.huang@gmail.com>
>> wrote:
>>
>>> Wow, so many replies and very educational. Thank you all!
>>>
>>> I'm working on a Graph backend that I hope the same infrastructure can
>>> support
>>>
>>> 1) interactive graph exploration and queries
>>>
>>> Answering what are the interactions among N users from time A to time B,
>>> and how are users connected (now and before).
>>>
>>> 2) real-time (<100ms) feature calculation (aggregation, matching) in a
>>> network of accounts
>>>
>>> Answering questions like: what's the ratio of newly registered accounts
>>> in my 'connected' (need flexible definition) network, how fast does it
>>> change; Does the network has path satisfying A(CN) -> B(IT) -> C(US) where
>>> the age of path is less than 3 days; etc.
>>>
>>> 3) offline simulation of events or offline calculation of new features
>>> (used for building models), so I need to take snapshots and also save
>>> point-in-time data
>>>
>>> Having them all-in-one in the same infrastructure will greatly simplify
>>> the implementation.
>>>
>>> BTW, I'm working for PayPal, Risk Data Science. (All questions above are
>>> fake and are not related to PayPal :)
>>>
>>> I made a prototype in the last two weeks for purpose 1) and my feeling
>>> about Accumulo is exactly what many of you has said: it just works! Very
>>> little admin work, Clean and clear documentation and APIs. One thing I
>>> haven't got right was high-speed ingestion, I only got 100K rows/sec/node,
>>> but it's already very satisfying. :)
>>>
>>> BTW, from Mike's slides it seems HBase is much faster in read throughput
>>> if the number of columns is small. Any comments? What about latency? Can I
>>> cache all data in memory in Accumulo to reduce latency for cold data (say I
>>> just restarted my cluster)?
>>>
>>>
>>> Jianshi
>>>
>>>
>>>
>>>
>>> On Tue, Jun 24, 2014 at 10:41 AM, William Slacum <
>>> wilhelm.von.cloud@accumulo.net> wrote:
>>>
>>>> I think first and foremost, how has writing your application been? Is
>>>> it something you can easily onboard other people for? Does it seem stable
>>>> enough? If you can answer those questions positively, I think you have a
>>>> winning situation.
>>>>
>>>> The big three Hadoop vendors (Cloudera, Hortonworks and MapR) all
>>>> provide some level of support for Accumulo, so it has the pedigree of other
>>>> members of the Hadoop ecosystem.
>>>>
>>>> Regarding the performance, I think Mike's presentation needs some
>>>> context. He can definitely provide more context than the rest of us (and
>>>> possibly Sean or Bill |-|), but I think one thing he was driving home is
>>>> that out of the box, Accumulo is configured to run on someone's laptop.
>>>> There are adjustments to be made when running at any scale greater than a
>>>> dev machine and they may not be documented clearly.
>>>>
>>>>
>>>> On Mon, Jun 23, 2014 at 8:16 PM, Tejinder S Luthra <tsluthra@us.ibm.com
>>>> > wrote:
>>>>
>>>>> Mike did a pretty good presentation on performance comparison between
>>>>> Accumulo / HBase. Again not official IMO but is pretty detailed in the
>>>>> approach take and apples-apples comparison
>>>>> http://www.slideshare.net/AccumuloSummit/10-30-drob
>>>>>
>>>>>
>>>>>
>>>>> [image: Inactive hide details for Jeremy Kepner ---06/23/2014 07:42:57
>>>>> PM---Performance is probably the largest difference between Accu]Jeremy
>>>>> Kepner ---06/23/2014 07:42:57 PM---Performance is probably the largest
>>>>> difference between Accumulo and HBase. Accumulo can ingest/scan
>>>>>
>>>>> From: Jeremy Kepner <kepner@ll.mit.edu>
>>>>> To: <user@accumulo.apache.org>
>>>>> Date: 06/23/2014 07:42 PM
>>>>> Subject: Re: How does Accumulo compare to HBase
>>>>> ------------------------------
>>>>>
>>>>>
>>>>>
>>>>> Performance is probably the largest difference between Accumulo and
>>>>> HBase.
>>>>>
>>>>> Accumulo can ingest/scan at a rate of 800K entries/sec/node.
>>>>> This performance scales well into the hundreds of nodes to deliver
>>>>> 100M+ entries/sec.
>>>>>
>>>>> There are no recent HBase benchmarks and none in the peer-reviewed
>>>>> literature.
>>>>> Old data suggests that HBase performance is ~1% of Accumulo
>>>>> performance.
>>>>>
>>>>> In short, one can often replace a 20+ node database with
>>>>> a single node Accumulo database.
>>>>>
>>>>> On Tue, Jun 24, 2014 at 01:55:54AM +0800, Jianshi Huang wrote:
>>>>> > Er... basically I need to explain to my manager why choosing
>>>>> Accumulo,
>>>>> > instead of HBase.
>>>>> >
>>>>> > So what are the pros and cons of Accumulo vs. HBase? (btw HBase
0.98
>>>>> also
>>>>> > got cell-level security, modeled after Accumulo)
>>>>> >
>>>>> > --
>>>>> > Jianshi Huang
>>>>> >
>>>>> > LinkedIn: jianshi
>>>>> > Twitter: @jshuang
>>>>> > Github & Blog: http://huangjs.github.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>
>>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
View raw message