hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: [DISCUSS] Multi-Cluster HBase Client
Date Tue, 30 Jun 2015 04:42:57 GMT
Hi Ted

I think the idea here is very similar to the Cross-site Big Table project
that was presented in HBaseCon 2014.

Pls find the slide linke below
http://www.slideshare.net/HBaseCon/ecosystem-session-3.
This project also adds a client side wrappers so that the client can
internally do a failover in case of a cluster going down and automatically
switching over to the replicated clusters based on the configurations.  Let
us know if you find this interesting.

Regards
Ram



On Tue, Jun 30, 2015 at 4:01 AM, Ted Malaska <ted.malaska@cloudera.com>
wrote:

> lol I did sorry, this is the right doc
>
>
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx.pdf
>
> On Mon, Jun 29, 2015 at 6:30 PM, Andrew Purtell <apurtell@apache.org>
> wrote:
>
> > I think you may have put up the wrong document? That link goes to product
> > doc.
> >
> >
> > On Mon, Jun 29, 2015 at 3:24 PM, Ted Malaska <ted.malaska@cloudera.com>
> > wrote:
> >
> > > Here is the PDF link.
> > >
> > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiClusterAndEDH_Latest.docx.pdf
> > >
> > > On Mon, Jun 29, 2015 at 6:09 PM, Sean Busbey <busbey@cloudera.com>
> > wrote:
> > >
> > > > Michael,
> > > >
> > > > This is the dev list, no sound-bite pitch is needed. We have plenty
> of
> > > > features that take time to explain the nuance. Please either engage
> > with
> > > > the complexity of the topic or wait for the feature to land and get
> > > > user-accessible documentation. We all get busy from time to time, but
> > > > that's no reason to push a higher burden on those who are currently
> > > engaged
> > > > with a particular effort, especially this early in development.
> > > >
> > > > That said, the first paragraph gives a suitable brief motivation
> > > (slightly
> > > > rephrased below):
> > > >
> > > > > Some applications require response and availability SLAs that a
> > single
> > > > HBase cluster can not meet alone. Particularly for
> > > > > high percentiles, queries to a single cluster can be delayed by
> e.g.
> > GC
> > > > pauses, individual server process failure, or maintenance
> > > > > activity. By providing clients with a transparent multi-cluster
> > > > configuration option we can avoid these outlier conditions by
> > > > > mask these failures from applications that are tolerant to weaker
> > > > consistency guarantees than HBase provides out of the box.
> > > >
> > > >
> > > > Ted,
> > > >
> > > > Thanks for writing this up! We'd prefer to keep discussion of it on
> the
> > > > mailing list, so please avoid moving to private webex's.
> > > >
> > > > Would you mind if I or one of the other community members converted
> the
> > > > design doc to pdf so that it's more accessible?
> > > >
> > > >
> > > >
> > > > On Mon, Jun 29, 2015 at 4:52 PM, Ted Malaska <
> ted.malaska@cloudera.com
> > >
> > > > wrote:
> > > >
> > > > > Why don't we set up a webex to talk out the detail.  What times r
u
> > > open
> > > > to
> > > > > talk this week.
> > > > >
> > > > > But to answer your questions.  This is for active active and active
> > > > > failover clusters.  There is a primary and n number of fail overs
> per
> > > > > client.  This is for gets and puts.
> > > > >
> > > > > There r a number of configs in the doc to define how to failover.
> > The
> > > > > options allow a couple different use cases.  There is a lot of
> detail
> > > in
> > > > > the doc and I just didn't want to put it all in the email.
> > > > >
> > > > > But honestly I put a lot of time in the doc.   I would love to know
> > > what
> > > > u
> > > > > think.
> > > > > On Jun 29, 2015 5:46 PM, "Michael Segel" <
> michael_segel@hotmail.com>
> > > > > wrote:
> > > > >
> > > > > > Ted,
> > > > > >
> > > > > > If you can’t do a 30 second pitch, then its not worth the
effort.
> > ;-)
> > > > > >
> > > > > > Look, when someone says that they want to have a single client
> talk
> > > to
> > > > > > multiple HBase clusters, that could mean two very different
> things.
> > > > > > First, you could mean that you want a single client to connect
to
> > an
> > > > > > active/active pair of HBase clusters where they replicate to
each
> > > > other.
> > > > > > (Active / Passive would also be implied, but then you have the
> > issue
> > > of
> > > > > > when does the passive cluster go active? )
> > > > > >
> > > > > > Then you have the issue of someone wanting to talk to multiple
> > > > different
> > > > > > clusters so that they can query the data, create local data
sets
> > > which
> > > > > they
> > > > > > wish to join, combining data from various sources.
> > > > > >
> > > > > > The second is a different problem from the first.
> > > > > >
> > > > > > -Mike
> > > > > >
> > > > > > > On Jun 29, 2015, at 3:38 PM, Ted Malaska <
> > ted.malaska@cloudera.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Hey Michael,
> > > > > > >
> > > > > > > Read the doc please.  It goes through everything at a low
> level.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Ted Malaska
> > > > > > >
> > > > > > > On Mon, Jun 29, 2015 at 4:36 PM, Michael Segel <
> > > > > > michael_segel@hotmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> No down time?
> > > > > > >>
> > > > > > >> So you want a client to go against a pair of active/active
> hbase
> > > > > > instances
> > > > > > >> on tied clusters?
> > > > > > >>
> > > > > > >>
> > > > > > >>> On Jun 29, 2015, at 3:20 PM, Ted Malaska <
> > > ted.malaska@cloudera.com
> > > > >
> > > > > > >> wrote:
> > > > > > >>>
> > > > > > >>> Hey Michael,
> > > > > > >>>
> > > > > > >>> The use case is simple "No down time use cases"
even in the
> > case
> > > of
> > > > > > site
> > > > > > >>> failure.
> > > > > > >>>
> > > > > > >>> Now on this statement
> > > > > > >>> "Why not simply manage each connection/context
via a threaded
> > > > child?"
> > > > > > >>>
> > > > > > >>> That is the point, to make that simple, tested,
easy, and
> > > > transparent
> > > > > > for
> > > > > > >>> HBase users.
> > > > > > >>>
> > > > > > >>> Ted Malaska
> > > > > > >>>
> > > > > > >>> On Mon, Jun 29, 2015 at 4:11 PM, Michael Segel
<
> > > > > > >> michael_segel@hotmail.com>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> So if I understand your goal, you want a client
who can
> > connect
> > > to
> > > > > one
> > > > > > >> or
> > > > > > >>>> more hbase clusters at the same time…
> > > > > > >>>>
> > > > > > >>>> Ok, so lets walk through the use case and help
me
> understand a
> > > > > couple
> > > > > > of
> > > > > > >>>> use cases for this…
> > > > > > >>>>
> > > > > > >>>> Why not simply manage each connection/context
via a threaded
> > > > child?
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>> On Jun 29, 2015, at 1:48 PM, Ted Malaska
<
> > > > ted.malaska@cloudera.com
> > > > > >
> > > > > > >>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>> Hey Dev List,
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> My name is Ted Malaska, long time lover
and user of HBase.
> I
> > > > would
> > > > > > like
> > > > > > >>>> to
> > > > > > >>>>> discuss adding in a multi-cluster client
into HBase. Here
> is
> > > the
> > > > > link
> > > > > > >> for
> > > > > > >>>>> the design doc (
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx%20(1).docx
> > > > > > >>>> )
> > > > > > >>>>> but I have pulled some parts into this
main e-mail to give
> > you
> > > a
> > > > > high
> > > > > > >>>> level
> > > > > > >>>>> understanding of it's scope.
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> *Goals*
> > > > > > >>>>>
> > > > > > >>>>> The proposed solution is a multi-cluster
HBase client that
> > > relies
> > > > > on
> > > > > > >> the
> > > > > > >>>>> existing HBase Replication functionality
to provide an
> > eventual
> > > > > > >>>> consistent
> > > > > > >>>>> solution in cases of primary cluster down
time.
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > >
> https://github.com/tmalaska/HBase.MCC/blob/master/FailoverImage.png
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> Additional goals are:
> > > > > > >>>>>
> > > > > > >>>>> -
> > > > > > >>>>>
> > > > > > >>>>> Be able to switch between single HBase
clusters to
> > Multi-HBase
> > > > > Client
> > > > > > >>>>> with limited or no code changes.  This
means using the
> > > > > > >>>> HConnectionManager,
> > > > > > >>>>> Connection, and Table interfaces to hide
complexities from
> > the
> > > > > > >>>> developer
> > > > > > >>>>> (Connection and Table are the new classes
for HConnection,
> > and
> > > > > > >>>>> HTableInterface in HBase version 0.99).
> > > > > > >>>>> -
> > > > > > >>>>>
> > > > > > >>>>> Offer thresholds to allow developers to
decide between
> > degrees
> > > of
> > > > > > >>>>> strongly consistent and eventually consistent.
> > > > > > >>>>> - Support N number of linked HBase Clusters
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> *Read-Replicas*
> > > > > > >>>>> Also note this is in alinement with Read-Replicas
and can
> > work
> > > > with
> > > > > > >> that.
> > > > > > >>>>> This client is multi-cluster where Read-Replicas
help us to
> > be
> > > > > multi
> > > > > > >>>> Region
> > > > > > >>>>> Server.
> > > > > > >>>>>
> > > > > > >>>>> *Replication*
> > > > > > >>>>> You will also see in the document that
this works with
> > current
> > > > > > >>>> replication
> > > > > > >>>>> and requires no changes to it.
> > > > > > >>>>>
> > > > > > >>>>> *Only a Client change*
> > > > > > >>>>> You will also see in the doc this is only
a new client.
> Which
> > > > means
> > > > > > no
> > > > > > >>>>> extra code for the end developer, only
addition configs to
> > set
> > > it
> > > > > up.
> > > > > > >>>>>
> > > > > > >>>>> *Github*
> > > > > > >>>>> This is a github project that shows that
this works at:
> > > > > > >>>>> https://github.com/tmalaska/HBase.MCC
> > > > > > >>>>> Note this is only a prototype. When adding
it to HBase we
> > will
> > > > use
> > > > > it
> > > > > > >> as
> > > > > > >>>> a
> > > > > > >>>>> starting point but there will be changes.
> > > > > > >>>>>
> > > > > > >>>>> *Initial Results:*
> > > > > > >>>>>
> > > > > > >>>>> Red is where our primary cluster has failed
and you will
> see
> > > from
> > > > > the
> > > > > > >>>>> bottom to graphs that our puts, deletes,
and gets are not
> > > > > > interrupted.
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/AveragePutTimeWithMultiRestartsAndShutDowns.png
> > > > > > >>>>>
> > > > > > >>>>> Thanks
> > > > > > >>>>> Ted Malaska
> > > > > > >>>>
> > > > > > >>>> The opinions expressed here are mine, while
they may
> reflect a
> > > > > > cognitive
> > > > > > >>>> thought, that is purely accidental.
> > > > > > >>>> Use at your own risk.
> > > > > > >>>> Michael Segel
> > > > > > >>>> michael_segel (AT) hotmail.com
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>
> > > > > > >> The opinions expressed here are mine, while they may
reflect a
> > > > > cognitive
> > > > > > >> thought, that is purely accidental.
> > > > > > >> Use at your own risk.
> > > > > > >> Michael Segel
> > > > > > >> michael_segel (AT) hotmail.com
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > > The opinions expressed here are mine, while they may reflect
a
> > > > cognitive
> > > > > > thought, that is purely accidental.
> > > > > > Use at your own risk.
> > > > > > Michael Segel
> > > > > > michael_segel (AT) hotmail.com
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sean
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message