accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: How does Accumulo compare to HBase
Date Tue, 24 Jun 2014 16:03:44 GMT
Hi Ted,

CF: maybe dozens
Columns: billions (rowkey = nodeId, CF = event type, CQ = Index+eventId)

Make sense?

Jianshi


On Tue, Jun 24, 2014 at 10:33 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Jianshi:
> How many column families and columns are you expecting (maximum) in your
> largest table ?
>
> Cheers
>
>
> On Tue, Jun 24, 2014 at 7:29 AM, Jianshi Huang <jianshi.huang@gmail.com>
> wrote:
>
>> Hi David,
>>
>> I did, it's a wonderful piece of work and for reviewing facts in a
>> networks it's a great tool. (And Lumify looks really nice)
>>
>> However, my queries are mostly time-bound (from time A to time B), and to
>> make some query real-time (< 50ms), I have to roll out my own schema and
>> index, to denormalize properties and to incrementally do aggregations. I
>> don't think there're existing solution in Graph database that can do these.
>>
>> And it's really fun to implement it myself. :)
>>
>> Please correct me if I'm wrong
>>
>> Jianshi
>>
>>
>>
>> On Tue, Jun 24, 2014 at 10:10 PM, David Medinets <
>> david.medinets@gmail.com> wrote:
>>
>>> Did you get a chance to review http://securegraph.org/? SecureGraph is
>>> an API to manipulate graphs, similar to Blueprints. Unlike Blueprints,
>>> every Secure graph method requires authorizations and visibilities.
>>> SecureGraph also supports multivalued properties as well as property
>>> metadata.
>>>
>>>
>>> On Tue, Jun 24, 2014 at 9:51 AM, Jianshi Huang <jianshi.huang@gmail.com>
>>> wrote:
>>>
>>>> Wow, so many replies and very educational. Thank you all!
>>>>
>>>> I'm working on a Graph backend that I hope the same infrastructure can
>>>> support
>>>>
>>>> 1) interactive graph exploration and queries
>>>>
>>>> Answering what are the interactions among N users from time A to time
>>>> B, and how are users connected (now and before).
>>>>
>>>> 2) real-time (<100ms) feature calculation (aggregation, matching) in a
>>>> network of accounts
>>>>
>>>> Answering questions like: what's the ratio of newly registered accounts
>>>> in my 'connected' (need flexible definition) network, how fast does it
>>>> change; Does the network has path satisfying A(CN) -> B(IT) -> C(US)
where
>>>> the age of path is less than 3 days; etc.
>>>>
>>>> 3) offline simulation of events or offline calculation of new features
>>>> (used for building models), so I need to take snapshots and also save
>>>> point-in-time data
>>>>
>>>> Having them all-in-one in the same infrastructure will greatly simplify
>>>> the implementation.
>>>>
>>>> BTW, I'm working for PayPal, Risk Data Science. (All questions above
>>>> are fake and are not related to PayPal :)
>>>>
>>>> I made a prototype in the last two weeks for purpose 1) and my feeling
>>>> about Accumulo is exactly what many of you has said: it just works! Very
>>>> little admin work, Clean and clear documentation and APIs. One thing I
>>>> haven't got right was high-speed ingestion, I only got 100K rows/sec/node,
>>>> but it's already very satisfying. :)
>>>>
>>>> BTW, from Mike's slides it seems HBase is much faster in read
>>>> throughput if the number of columns is small. Any comments? What about
>>>> latency? Can I cache all data in memory in Accumulo to reduce latency for
>>>> cold data (say I just restarted my cluster)?
>>>>
>>>>
>>>> Jianshi
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jun 24, 2014 at 10:41 AM, William Slacum <
>>>> wilhelm.von.cloud@accumulo.net> wrote:
>>>>
>>>>> I think first and foremost, how has writing your application been? Is
>>>>> it something you can easily onboard other people for? Does it seem stable
>>>>> enough? If you can answer those questions positively, I think you have
a
>>>>> winning situation.
>>>>>
>>>>> The big three Hadoop vendors (Cloudera, Hortonworks and MapR) all
>>>>> provide some level of support for Accumulo, so it has the pedigree of
other
>>>>> members of the Hadoop ecosystem.
>>>>>
>>>>> Regarding the performance, I think Mike's presentation needs some
>>>>> context. He can definitely provide more context than the rest of us (and
>>>>> possibly Sean or Bill |-|), but I think one thing he was driving home
is
>>>>> that out of the box, Accumulo is configured to run on someone's laptop.
>>>>> There are adjustments to be made when running at any scale greater than
a
>>>>> dev machine and they may not be documented clearly.
>>>>>
>>>>>
>>>>> On Mon, Jun 23, 2014 at 8:16 PM, Tejinder S Luthra <
>>>>> tsluthra@us.ibm.com> wrote:
>>>>>
>>>>>> Mike did a pretty good presentation on performance comparison between
>>>>>> Accumulo / HBase. Again not official IMO but is pretty detailed in
the
>>>>>> approach take and apples-apples comparison
>>>>>> http://www.slideshare.net/AccumuloSummit/10-30-drob
>>>>>>
>>>>>>
>>>>>>
>>>>>> [image: Inactive hide details for Jeremy Kepner ---06/23/2014
>>>>>> 07:42:57 PM---Performance is probably the largest difference between
Accu]Jeremy
>>>>>> Kepner ---06/23/2014 07:42:57 PM---Performance is probably the largest
>>>>>> difference between Accumulo and HBase. Accumulo can ingest/scan
>>>>>>
>>>>>> From: Jeremy Kepner <kepner@ll.mit.edu>
>>>>>> To: <user@accumulo.apache.org>
>>>>>> Date: 06/23/2014 07:42 PM
>>>>>> Subject: Re: How does Accumulo compare to HBase
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>> Performance is probably the largest difference between Accumulo and
>>>>>> HBase.
>>>>>>
>>>>>> Accumulo can ingest/scan at a rate of 800K entries/sec/node.
>>>>>> This performance scales well into the hundreds of nodes to deliver
>>>>>> 100M+ entries/sec.
>>>>>>
>>>>>> There are no recent HBase benchmarks and none in the peer-reviewed
>>>>>> literature.
>>>>>> Old data suggests that HBase performance is ~1% of Accumulo
>>>>>> performance.
>>>>>>
>>>>>> In short, one can often replace a 20+ node database with
>>>>>> a single node Accumulo database.
>>>>>>
>>>>>> On Tue, Jun 24, 2014 at 01:55:54AM +0800, Jianshi Huang wrote:
>>>>>> > Er... basically I need to explain to my manager why choosing
>>>>>> Accumulo,
>>>>>> > instead of HBase.
>>>>>> >
>>>>>> > So what are the pros and cons of Accumulo vs. HBase? (btw HBase
>>>>>> 0.98 also
>>>>>> > got cell-level security, modeled after Accumulo)
>>>>>> >
>>>>>> > --
>>>>>> > Jianshi Huang
>>>>>> >
>>>>>> > LinkedIn: jianshi
>>>>>> > Twitter: @jshuang
>>>>>> > Github & Blog: http://huangjs.github.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jianshi Huang
>>>>
>>>> LinkedIn: jianshi
>>>> Twitter: @jshuang
>>>> Github & Blog: http://huangjs.github.com/
>>>>
>>>
>>>
>>
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
View raw message