accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase and Accumulo
Date Wed, 19 Aug 2015 21:42:32 GMT
I think someone who uses third party benchmarks to assess a system like
HBase or Accumulo (or Cassandra...) is taking a foolish shortcut, so
perhaps we must agree to disagree.


On Wed, Aug 19, 2015 at 2:34 PM, Jeremy Kepner <kepner@ll.mit.edu> wrote:

> I agree, that performance on real apps is the most important for
> any particular organization, but as technologists how do we measure
> ourselves?
> Hence imperfect benchmarking remains our only recourse.
>
> On Wed, Aug 19, 2015 at 12:34:44PM -0700, Andrew Purtell wrote:
> > I can't speak for anyone other than myself in the HBase community, but
> I'm
> > much more interested and focused on performance analysis and
> > developing/deploying for the use cases of my employer than participating
> in
> > generic bench-marketing to make weapons for happy OSS warriors. Perhaps
> > this does a disservice to the HBase project overall and if so then I
> > apologize to others on the project for that.
> >
> > That said, from long and bitter experience let me state the only
> benchmarks
> > that every really matter are the comparative benchmarks you make for your
> > own use cases in your own environments, preferably exercising those
> > candidates with real data and operating conditions. See:
> > https://pbs.twimg.com/media/CMnTyKVUEAA1tOm.jpg (smile)
> >
> >
> >
> > On Wed, Aug 19, 2015 at 12:27 PM, Josh Elser <josh.elser@gmail.com>
> wrote:
> >
> > > Alright, I have to ask... are you referring to the paper that cites
> > > Accumulo performance without write-ahead logs enabled? I have some
> serious
> > > reservations about the relevance of that paper to this conversation and
> > > just want to make sure people aren't led astray by what the actual
> takeaway
> > > should be.
> > >
> > > Jeremy Kepner wrote:
> > >
> > >> A big difference between Accumulo and HBase is the published
> performance
> > >> numbers.
> > >> The Accumulo community has done a good job of continuing to publish
> > >> up-to-date performance
> > >> numbers in peer-reviewed venues which allow Accumulo to claim best in
> the
> > >> world performance.
> > >>
> > >> The HBase community hasn't been doing that so much.  It would be
> great if
> > >> they did because
> > >> the HBase points on the graphs are old and it would be good to get new
> > >> ones.
> > >>
> > >>
> > >>
> > >> On Wed, Aug 19, 2015 at 02:30:58PM -0400, Josh Elser wrote:
> > >>
> > >>> Like I've said many times now, it's relative to your actual problem.
> > >>> If you don't have that much data (or intend to grow into that much
> > >>> data), it's not an issue. Obviously, this is the case for you.
> > >>>
> > >>> However, it is an architectural difference between the two projects
> > >>> with known limitations for a single metadata region. It's a
> > >>> difference as what was asked for by Jerry.
> > >>>
> > >>> Ted Malaska wrote:
> > >>>
> > >>>> I've been doing HBase for a long time and never had an issue with
> region
> > >>>> count limits and I have clusters with 10s of billions of records.
> Many
> > >>>> there would be issues around a couple Trillion records, but never
> got
> > >>>> that
> > >>>> high yet.
> > >>>>
> > >>>> Ted Malaska
> > >>>>
> > >>>> On Wed, Aug 19, 2015 at 2:24 PM, Josh Elser<josh.elser@gmail.com>
> > >>>>  wrote:
> > >>>>
> > >>>> Oh, one other thing that I should mention (was prompted off-list).
> > >>>>>
> > >>>>> (definition time since cross-list now: HBase regions == Accumulo
> > >>>>> tablets)
> > >>>>>
> > >>>>> Accumulo will handle many more regions than HBase does now
due to a
> > >>>>> splittable metadata table. While I was told this was a very
long
> and
> > >>>>> arduous journey to implement correctly (WRT splitting, merges
and
> bulk
> > >>>>> loading), users with "too many regions" problems are extremely
few
> and
> > >>>>> far
> > >>>>> between for Accumulo.
> > >>>>>
> > >>>>> I was very happy to see effort/design being put into this in
HBase.
> > >>>>> And,
> > >>>>> just to be fair in criticism/praises, HBase does appear to
me to do
> > >>>>> assignments of regions much faster than Accumulo does on a
small
> > >>>>> cluster
> > >>>>> (~5-10 nodes). Accumulo may take a few seconds to notice and
> reassign
> > >>>>> tablets. I have yet to notice this with HBase (which also could
be
> due
> > >>>>> to
> > >>>>> lack of personal testing).
> > >>>>>
> > >>>>>
> > >>>>> Jerry He wrote:
> > >>>>>
> > >>>>> Hi, folks
> > >>>>>>
> > >>>>>> We have people that are evaluating HBase vs Accumulo.
> > >>>>>> Security is an important factor.
> > >>>>>>
> > >>>>>> But I think after the Cell security was added in HBase,
there is
> no
> > >>>>>> more
> > >>>>>> real gap compared to Accumulo.
> > >>>>>>
> > >>>>>> I know we have both HBase and Accumulo experts on this
list.
> > >>>>>> Could someone shred more light?
> > >>>>>> I am looking for real gap comparing HBase to Accumulo if
there is
> any
> > >>>>>> so
> > >>>>>> that I can be prepared to address them. This is not limited
to the
> > >>>>>> security
> > >>>>>> area.
> > >>>>>>
> > >>>>>> There are differences in some features and implementations.
But
> they
> > >>>>>> don't
> > >>>>>> see like real 'gaps'.
> > >>>>>>
> > >>>>>> Any comments and feedbacks are welcome.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>> Jerry
> > >>>>>>
> > >>>>>>
> > >>>>>>
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message