accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: HBase and Accumulo
Date Wed, 19 Aug 2015 22:47:21 GMT
Ah right, I did forgot about that paper. Thanks for clarifying.

Big +1 to Andy's comments, too.

Jeremy Kepner wrote:
> Turning off the walog was mostly to shorten the benchmarking cycle
> (it allowed us to go from zero to peak ingest in a few seconds).  BAH got
> pretty much the same performance results in their paper,
> it just took longer for their experiments to run.
> So, in this case, we had two different teams doing things different
> ways and getting the same result, which is what we like to see.
> On Wed, Aug 19, 2015 at 03:27:07PM -0400, Josh Elser wrote:
>> Alright, I have to ask... are you referring to the paper that cites
>> Accumulo performance without write-ahead logs enabled? I have some
>> serious reservations about the relevance of that paper to this
>> conversation and just want to make sure people aren't led astray by
>> what the actual takeaway should be.
>> Jeremy Kepner wrote:
>>> A big difference between Accumulo and HBase is the published performance numbers.
>>> The Accumulo community has done a good job of continuing to publish up-to-date
>>> numbers in peer-reviewed venues which allow Accumulo to claim best in the world
>>> The HBase community hasn't been doing that so much.  It would be great if they
did because
>>> the HBase points on the graphs are old and it would be good to get new ones.
>>> On Wed, Aug 19, 2015 at 02:30:58PM -0400, Josh Elser wrote:
>>>> Like I've said many times now, it's relative to your actual problem.
>>>> If you don't have that much data (or intend to grow into that much
>>>> data), it's not an issue. Obviously, this is the case for you.
>>>> However, it is an architectural difference between the two projects
>>>> with known limitations for a single metadata region. It's a
>>>> difference as what was asked for by Jerry.
>>>> Ted Malaska wrote:
>>>>> I've been doing HBase for a long time and never had an issue with region
>>>>> count limits and I have clusters with 10s of billions of records.  Many
>>>>> there would be issues around a couple Trillion records, but never got
>>>>> high yet.
>>>>> Ted Malaska
>>>>> On Wed, Aug 19, 2015 at 2:24 PM, Josh Elser<>
>>>>>> Oh, one other thing that I should mention (was prompted off-list).
>>>>>> (definition time since cross-list now: HBase regions == Accumulo
>>>>>> Accumulo will handle many more regions than HBase does now due to
>>>>>> splittable metadata table. While I was told this was a very long
>>>>>> arduous journey to implement correctly (WRT splitting, merges and
>>>>>> loading), users with "too many regions" problems are extremely few
and far
>>>>>> between for Accumulo.
>>>>>> I was very happy to see effort/design being put into this in HBase.
>>>>>> just to be fair in criticism/praises, HBase does appear to me to
>>>>>> assignments of regions much faster than Accumulo does on a small
>>>>>> (~5-10 nodes). Accumulo may take a few seconds to notice and reassign
>>>>>> tablets. I have yet to notice this with HBase (which also could be
due to
>>>>>> lack of personal testing).
>>>>>> Jerry He wrote:
>>>>>>> Hi, folks
>>>>>>> We have people that are evaluating HBase vs Accumulo.
>>>>>>> Security is an important factor.
>>>>>>> But I think after the Cell security was added in HBase, there
is no more
>>>>>>> real gap compared to Accumulo.
>>>>>>> I know we have both HBase and Accumulo experts on this list.
>>>>>>> Could someone shred more light?
>>>>>>> I am looking for real gap comparing HBase to Accumulo if there
is any so
>>>>>>> that I can be prepared to address them. This is not limited to
>>>>>>> security
>>>>>>> area.
>>>>>>> There are differences in some features and implementations. But
they don't
>>>>>>> see like real 'gaps'.
>>>>>>> Any comments and feedbacks are welcome.
>>>>>>> Thanks,
>>>>>>> Jerry

View raw message