hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: HBase and Accumulo
Date Fri, 21 Aug 2015 03:53:02 GMT
Hi All

Thanks for all the discussion. Joining in late to the discussion.
Apart from the above the other update is in the replicaiton layer when we
have cells with Visibility tags.  The facility is to replicate the labels
as String is available now.  So in the peer cluster we need not depend on
the ordinal numbering -sometimes the ordinal numbering may not be same in
the source and peer cluster. In all such cases replicating as String would
be easier so that the validation and ordinal creation can be done based on
the peer cluster.

@Jerry He
I know you have been actively reviewing and working on the cell level
features.  So may be when you come across some gaps better to raise JIRAs
and we can try to close that gap.  The feature may require some updates
after the proc V2 goes in I suppose.

Thanks for  a nice discussion.
Regards
Ram

On Thu, Aug 20, 2015 at 10:54 PM, Anoop John <anoop.hbase@gmail.com> wrote:

> >>4) HBase's Cell Level Visiblity Expression needs some scale-tests to
> figure
> >>out the exact boundary, but it will probably have  bad time at ~millions
> or
> >>~10s of millions of unique labels.
>
> >>Most use cases won't be impacted by this, I expect.
>
>  >>5) HBase stores a representation of the visibility expression rather
> than
> >>the raw expression with each cell.
>
>
> I know Sean you are telling abt 1.1.1 version.  In trunk we do have ways
> with which we can store the raw vis expression with cells rather than the
> ordinal based representation.  By default it will be ordinal based bit
> position ON or OFF mechanism.  This is done because at the scan time, the
> expression matching will be much faster as we dont have to evaluate the raw
> expression and then do string matching. When the total set of labels is not
> so big , this model will be much better.  Agree that will have an issue
> when millions of labels. Different reasons for that. As other system
> tables, labels table also single region.  We use zk based notification bus
> to sync RSs for labels.  The 2nd one will get solved once we rewrite it to
> use proc V2 based solution.
> So for such usages of huge #labels, we can go with raw expression storing
> way.
>
>
> -Anoop-
>
>
> On Thu, Aug 20, 2015 at 6:35 AM, Jerry He <jerryjch@gmail.com> wrote:
>
> > I definitely agree HBase has a broader base.  Thanks, Ted.
> >
> > Jerry
> >
> > On Wed, Aug 19, 2015 at 4:42 PM, Ted Malaska <ted.malaska@cloudera.com>
> > wrote:
> >
> > > I would say most banks are hbase but there r a few with accumulo.  I
> have
> > > most bank, broker dealers and regulators in my region. Also I think we
> r
> > > talking about the same foreign bank ;)
> > >
> > > Ted Malaska
> > > On Aug 19, 2015 7:15 PM, "Jerry He" <jerryjch@gmail.com> wrote:
> > >
> > > > Hi, folks
> > > >
> > > > Thanks so much for all the responses and comments.
> > > >
> > > > We don't have or support Accumulo yet  We support HBase.  There have
> > been
> > > > requests for Accumulo. Like Ted said, almost all from Federal sector
> > and
> > > > Banks (even foreign banks).
> > > > They seem to have References or reference implementations for their
> use
> > > > cases.  My work of persuasion for HBase has not been very successful.
> > > >
> > > > I had looked into the HBase cell security. There are maybe some
> > > differences
> > > > and misses like Sean mentioned. I think overall the visibility
> coverage
> > > > plus the ACL are great.
> > > >
> > > > Technology aside, Accumulo's reputation in the specific areas it is
> > good
> > > at
> > > > is probably there.
> > > >
> > > > It will probably be slow evolving process ...
> > > >
> > > > Jerry
> > > >
> > > >
> > > >
> > > > On Wed, Aug 19, 2015 at 3:54 PM, Ted Malaska <
> ted.malaska@cloudera.com
> > >
> > > > wrote:
> > > >
> > > > > I'm on the side of benchmarking for the use case and with an
> expert.
> > > > There
> > > > > a so many ways to cheat a benchmark.  And the bench mark may not
be
> > > > > anything like your use case.
> > > > > On Aug 19, 2015 5:43 PM, "Andrew Purtell" <apurtell@apache.org>
> > wrote:
> > > > >
> > > > > > I think someone who uses third party benchmarks to assess a
> system
> > > like
> > > > > > HBase or Accumulo (or Cassandra...) is taking a foolish shortcut,
> > so
> > > > > > perhaps we must agree to disagree.
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 19, 2015 at 2:34 PM, Jeremy Kepner <
> kepner@ll.mit.edu>
> > > > > wrote:
> > > > > >
> > > > > > > I agree, that performance on real apps is the most important
> for
> > > > > > > any particular organization, but as technologists how do
we
> > measure
> > > > > > > ourselves?
> > > > > > > Hence imperfect benchmarking remains our only recourse.
> > > > > > >
> > > > > > > On Wed, Aug 19, 2015 at 12:34:44PM -0700, Andrew Purtell
wrote:
> > > > > > > > I can't speak for anyone other than myself in the
HBase
> > > community,
> > > > > but
> > > > > > > I'm
> > > > > > > > much more interested and focused on performance analysis
and
> > > > > > > > developing/deploying for the use cases of my employer
than
> > > > > > participating
> > > > > > > in
> > > > > > > > generic bench-marketing to make weapons for happy
OSS
> warriors.
> > > > > Perhaps
> > > > > > > > this does a disservice to the HBase project overall
and if so
> > > then
> > > > I
> > > > > > > > apologize to others on the project for that.
> > > > > > > >
> > > > > > > > That said, from long and bitter experience let me
state the
> > only
> > > > > > > benchmarks
> > > > > > > > that every really matter are the comparative benchmarks
you
> > make
> > > > for
> > > > > > your
> > > > > > > > own use cases in your own environments, preferably
exercising
> > > those
> > > > > > > > candidates with real data and operating conditions.
See:
> > > > > > > > https://pbs.twimg.com/media/CMnTyKVUEAA1tOm.jpg (smile)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Aug 19, 2015 at 12:27 PM, Josh Elser <
> > > josh.elser@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Alright, I have to ask... are you referring to
the paper
> that
> > > > cites
> > > > > > > > > Accumulo performance without write-ahead logs
enabled? I
> have
> > > > some
> > > > > > > serious
> > > > > > > > > reservations about the relevance of that paper
to this
> > > > conversation
> > > > > > and
> > > > > > > > > just want to make sure people aren't led astray
by what the
> > > > actual
> > > > > > > takeaway
> > > > > > > > > should be.
> > > > > > > > >
> > > > > > > > > Jeremy Kepner wrote:
> > > > > > > > >
> > > > > > > > >> A big difference between Accumulo and HBase
is the
> published
> > > > > > > performance
> > > > > > > > >> numbers.
> > > > > > > > >> The Accumulo community has done a good job
of continuing
> to
> > > > > publish
> > > > > > > > >> up-to-date performance
> > > > > > > > >> numbers in peer-reviewed venues which allow
Accumulo to
> > claim
> > > > best
> > > > > > in
> > > > > > > the
> > > > > > > > >> world performance.
> > > > > > > > >>
> > > > > > > > >> The HBase community hasn't been doing that
so much.  It
> > would
> > > be
> > > > > > > great if
> > > > > > > > >> they did because
> > > > > > > > >> the HBase points on the graphs are old and
it would be
> good
> > to
> > > > get
> > > > > > new
> > > > > > > > >> ones.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Wed, Aug 19, 2015 at 02:30:58PM -0400,
Josh Elser
> wrote:
> > > > > > > > >>
> > > > > > > > >>> Like I've said many times now, it's relative
to your
> actual
> > > > > > problem.
> > > > > > > > >>> If you don't have that much data (or
intend to grow into
> > that
> > > > > much
> > > > > > > > >>> data), it's not an issue. Obviously,
this is the case for
> > > you.
> > > > > > > > >>>
> > > > > > > > >>> However, it is an architectural difference
between the
> two
> > > > > projects
> > > > > > > > >>> with known limitations for a single metadata
region.
> It's a
> > > > > > > > >>> difference as what was asked for by Jerry.
> > > > > > > > >>>
> > > > > > > > >>> Ted Malaska wrote:
> > > > > > > > >>>
> > > > > > > > >>>> I've been doing HBase for a long
time and never had an
> > issue
> > > > > with
> > > > > > > region
> > > > > > > > >>>> count limits and I have clusters
with 10s of billions of
> > > > > records.
> > > > > > > Many
> > > > > > > > >>>> there would be issues around a couple
Trillion records,
> > but
> > > > > never
> > > > > > > got
> > > > > > > > >>>> that
> > > > > > > > >>>> high yet.
> > > > > > > > >>>>
> > > > > > > > >>>> Ted Malaska
> > > > > > > > >>>>
> > > > > > > > >>>> On Wed, Aug 19, 2015 at 2:24 PM,
Josh Elser<
> > > > > josh.elser@gmail.com>
> > > > > > > > >>>>  wrote:
> > > > > > > > >>>>
> > > > > > > > >>>> Oh, one other thing that I should
mention (was prompted
> > > > > off-list).
> > > > > > > > >>>>>
> > > > > > > > >>>>> (definition time since cross-list
now: HBase regions ==
> > > > > Accumulo
> > > > > > > > >>>>> tablets)
> > > > > > > > >>>>>
> > > > > > > > >>>>> Accumulo will handle many more
regions than HBase does
> > now
> > > > due
> > > > > > to a
> > > > > > > > >>>>> splittable metadata table. While
I was told this was a
> > very
> > > > > long
> > > > > > > and
> > > > > > > > >>>>> arduous journey to implement
correctly (WRT splitting,
> > > merges
> > > > > and
> > > > > > > bulk
> > > > > > > > >>>>> loading), users with "too many
regions" problems are
> > > > extremely
> > > > > > few
> > > > > > > and
> > > > > > > > >>>>> far
> > > > > > > > >>>>> between for Accumulo.
> > > > > > > > >>>>>
> > > > > > > > >>>>> I was very happy to see effort/design
being put into
> this
> > > in
> > > > > > HBase.
> > > > > > > > >>>>> And,
> > > > > > > > >>>>> just to be fair in criticism/praises,
HBase does appear
> > to
> > > me
> > > > > to
> > > > > > do
> > > > > > > > >>>>> assignments of regions much faster
than Accumulo does
> on
> > a
> > > > > small
> > > > > > > > >>>>> cluster
> > > > > > > > >>>>> (~5-10 nodes). Accumulo may take
a few seconds to
> notice
> > > and
> > > > > > > reassign
> > > > > > > > >>>>> tablets. I have yet to notice
this with HBase (which
> also
> > > > could
> > > > > > be
> > > > > > > due
> > > > > > > > >>>>> to
> > > > > > > > >>>>> lack of personal testing).
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>> Jerry He wrote:
> > > > > > > > >>>>>
> > > > > > > > >>>>> Hi, folks
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> We have people that are evaluating
HBase vs Accumulo.
> > > > > > > > >>>>>> Security is an important
factor.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> But I think after the Cell
security was added in
> HBase,
> > > > there
> > > > > is
> > > > > > > no
> > > > > > > > >>>>>> more
> > > > > > > > >>>>>> real gap compared to Accumulo.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> I know we have both HBase
and Accumulo experts on this
> > > list.
> > > > > > > > >>>>>> Could someone shred more
light?
> > > > > > > > >>>>>> I am looking for real gap
comparing HBase to Accumulo
> if
> > > > there
> > > > > > is
> > > > > > > any
> > > > > > > > >>>>>> so
> > > > > > > > >>>>>> that I can be prepared to
address them. This is not
> > > limited
> > > > to
> > > > > > the
> > > > > > > > >>>>>> security
> > > > > > > > >>>>>> area.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> There are differences in
some features and
> > > implementations.
> > > > > But
> > > > > > > they
> > > > > > > > >>>>>> don't
> > > > > > > > >>>>>> see like real 'gaps'.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> Any comments and feedbacks
are welcome.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> Thanks,
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> Jerry
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > >    - Andy
> > > > > > > >
> > > > > > > > Problems worthy of attack prove their worth by hitting
back.
> -
> > > Piet
> > > > > > Hein
> > > > > > > > (via Tom White)
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > >
> > > > > >    - Andy
> > > > > >
> > > > > > Problems worthy of attack prove their worth by hitting back.
-
> Piet
> > > > Hein
> > > > > > (via Tom White)
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message