accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Malaska <ted.mala...@cloudera.com>
Subject Re: HBase and Accumulo
Date Wed, 19 Aug 2015 22:54:47 GMT
I'm on the side of benchmarking for the use case and with an expert.  There
a so many ways to cheat a benchmark.  And the bench mark may not be
anything like your use case.
On Aug 19, 2015 5:43 PM, "Andrew Purtell" <apurtell@apache.org> wrote:

> I think someone who uses third party benchmarks to assess a system like
> HBase or Accumulo (or Cassandra...) is taking a foolish shortcut, so
> perhaps we must agree to disagree.
>
>
> On Wed, Aug 19, 2015 at 2:34 PM, Jeremy Kepner <kepner@ll.mit.edu> wrote:
>
> > I agree, that performance on real apps is the most important for
> > any particular organization, but as technologists how do we measure
> > ourselves?
> > Hence imperfect benchmarking remains our only recourse.
> >
> > On Wed, Aug 19, 2015 at 12:34:44PM -0700, Andrew Purtell wrote:
> > > I can't speak for anyone other than myself in the HBase community, but
> > I'm
> > > much more interested and focused on performance analysis and
> > > developing/deploying for the use cases of my employer than
> participating
> > in
> > > generic bench-marketing to make weapons for happy OSS warriors. Perhaps
> > > this does a disservice to the HBase project overall and if so then I
> > > apologize to others on the project for that.
> > >
> > > That said, from long and bitter experience let me state the only
> > benchmarks
> > > that every really matter are the comparative benchmarks you make for
> your
> > > own use cases in your own environments, preferably exercising those
> > > candidates with real data and operating conditions. See:
> > > https://pbs.twimg.com/media/CMnTyKVUEAA1tOm.jpg (smile)
> > >
> > >
> > >
> > > On Wed, Aug 19, 2015 at 12:27 PM, Josh Elser <josh.elser@gmail.com>
> > wrote:
> > >
> > > > Alright, I have to ask... are you referring to the paper that cites
> > > > Accumulo performance without write-ahead logs enabled? I have some
> > serious
> > > > reservations about the relevance of that paper to this conversation
> and
> > > > just want to make sure people aren't led astray by what the actual
> > takeaway
> > > > should be.
> > > >
> > > > Jeremy Kepner wrote:
> > > >
> > > >> A big difference between Accumulo and HBase is the published
> > performance
> > > >> numbers.
> > > >> The Accumulo community has done a good job of continuing to publish
> > > >> up-to-date performance
> > > >> numbers in peer-reviewed venues which allow Accumulo to claim best
> in
> > the
> > > >> world performance.
> > > >>
> > > >> The HBase community hasn't been doing that so much.  It would be
> > great if
> > > >> they did because
> > > >> the HBase points on the graphs are old and it would be good to get
> new
> > > >> ones.
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Aug 19, 2015 at 02:30:58PM -0400, Josh Elser wrote:
> > > >>
> > > >>> Like I've said many times now, it's relative to your actual
> problem.
> > > >>> If you don't have that much data (or intend to grow into that
much
> > > >>> data), it's not an issue. Obviously, this is the case for you.
> > > >>>
> > > >>> However, it is an architectural difference between the two projects
> > > >>> with known limitations for a single metadata region. It's a
> > > >>> difference as what was asked for by Jerry.
> > > >>>
> > > >>> Ted Malaska wrote:
> > > >>>
> > > >>>> I've been doing HBase for a long time and never had an issue
with
> > region
> > > >>>> count limits and I have clusters with 10s of billions of records.
> > Many
> > > >>>> there would be issues around a couple Trillion records, but
never
> > got
> > > >>>> that
> > > >>>> high yet.
> > > >>>>
> > > >>>> Ted Malaska
> > > >>>>
> > > >>>> On Wed, Aug 19, 2015 at 2:24 PM, Josh Elser<josh.elser@gmail.com>
> > > >>>>  wrote:
> > > >>>>
> > > >>>> Oh, one other thing that I should mention (was prompted off-list).
> > > >>>>>
> > > >>>>> (definition time since cross-list now: HBase regions ==
Accumulo
> > > >>>>> tablets)
> > > >>>>>
> > > >>>>> Accumulo will handle many more regions than HBase does
now due
> to a
> > > >>>>> splittable metadata table. While I was told this was a
very long
> > and
> > > >>>>> arduous journey to implement correctly (WRT splitting,
merges and
> > bulk
> > > >>>>> loading), users with "too many regions" problems are extremely
> few
> > and
> > > >>>>> far
> > > >>>>> between for Accumulo.
> > > >>>>>
> > > >>>>> I was very happy to see effort/design being put into this
in
> HBase.
> > > >>>>> And,
> > > >>>>> just to be fair in criticism/praises, HBase does appear
to me to
> do
> > > >>>>> assignments of regions much faster than Accumulo does
on a small
> > > >>>>> cluster
> > > >>>>> (~5-10 nodes). Accumulo may take a few seconds to notice
and
> > reassign
> > > >>>>> tablets. I have yet to notice this with HBase (which also
could
> be
> > due
> > > >>>>> to
> > > >>>>> lack of personal testing).
> > > >>>>>
> > > >>>>>
> > > >>>>> Jerry He wrote:
> > > >>>>>
> > > >>>>> Hi, folks
> > > >>>>>>
> > > >>>>>> We have people that are evaluating HBase vs Accumulo.
> > > >>>>>> Security is an important factor.
> > > >>>>>>
> > > >>>>>> But I think after the Cell security was added in HBase,
there is
> > no
> > > >>>>>> more
> > > >>>>>> real gap compared to Accumulo.
> > > >>>>>>
> > > >>>>>> I know we have both HBase and Accumulo experts on
this list.
> > > >>>>>> Could someone shred more light?
> > > >>>>>> I am looking for real gap comparing HBase to Accumulo
if there
> is
> > any
> > > >>>>>> so
> > > >>>>>> that I can be prepared to address them. This is not
limited to
> the
> > > >>>>>> security
> > > >>>>>> area.
> > > >>>>>>
> > > >>>>>> There are differences in some features and implementations.
But
> > they
> > > >>>>>> don't
> > > >>>>>> see like real 'gaps'.
> > > >>>>>>
> > > >>>>>> Any comments and feedbacks are welcome.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>>
> > > >>>>>> Jerry
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message