hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Malaska <ted.mala...@cloudera.com>
Subject Re: [DISCUSS] Multi-Cluster HBase Client
Date Mon, 29 Jun 2015 22:24:39 GMT
Here is the PDF link.

https://github.com/tmalaska/HBase.MCC/blob/master/MultiClusterAndEDH_Latest.docx.pdf

On Mon, Jun 29, 2015 at 6:09 PM, Sean Busbey <busbey@cloudera.com> wrote:

> Michael,
>
> This is the dev list, no sound-bite pitch is needed. We have plenty of
> features that take time to explain the nuance. Please either engage with
> the complexity of the topic or wait for the feature to land and get
> user-accessible documentation. We all get busy from time to time, but
> that's no reason to push a higher burden on those who are currently engaged
> with a particular effort, especially this early in development.
>
> That said, the first paragraph gives a suitable brief motivation (slightly
> rephrased below):
>
> > Some applications require response and availability SLAs that a single
> HBase cluster can not meet alone. Particularly for
> > high percentiles, queries to a single cluster can be delayed by e.g. GC
> pauses, individual server process failure, or maintenance
> > activity. By providing clients with a transparent multi-cluster
> configuration option we can avoid these outlier conditions by
> > mask these failures from applications that are tolerant to weaker
> consistency guarantees than HBase provides out of the box.
>
>
> Ted,
>
> Thanks for writing this up! We'd prefer to keep discussion of it on the
> mailing list, so please avoid moving to private webex's.
>
> Would you mind if I or one of the other community members converted the
> design doc to pdf so that it's more accessible?
>
>
>
> On Mon, Jun 29, 2015 at 4:52 PM, Ted Malaska <ted.malaska@cloudera.com>
> wrote:
>
> > Why don't we set up a webex to talk out the detail.  What times r u open
> to
> > talk this week.
> >
> > But to answer your questions.  This is for active active and active
> > failover clusters.  There is a primary and n number of fail overs per
> > client.  This is for gets and puts.
> >
> > There r a number of configs in the doc to define how to failover.  The
> > options allow a couple different use cases.  There is a lot of detail in
> > the doc and I just didn't want to put it all in the email.
> >
> > But honestly I put a lot of time in the doc.   I would love to know what
> u
> > think.
> > On Jun 29, 2015 5:46 PM, "Michael Segel" <michael_segel@hotmail.com>
> > wrote:
> >
> > > Ted,
> > >
> > > If you can’t do a 30 second pitch, then its not worth the effort. ;-)
> > >
> > > Look, when someone says that they want to have a single client talk to
> > > multiple HBase clusters, that could mean two very different things.
> > > First, you could mean that you want a single client to connect to an
> > > active/active pair of HBase clusters where they replicate to each
> other.
> > > (Active / Passive would also be implied, but then you have the issue of
> > > when does the passive cluster go active? )
> > >
> > > Then you have the issue of someone wanting to talk to multiple
> different
> > > clusters so that they can query the data, create local data sets which
> > they
> > > wish to join, combining data from various sources.
> > >
> > > The second is a different problem from the first.
> > >
> > > -Mike
> > >
> > > > On Jun 29, 2015, at 3:38 PM, Ted Malaska <ted.malaska@cloudera.com>
> > > wrote:
> > > >
> > > > Hey Michael,
> > > >
> > > > Read the doc please.  It goes through everything at a low level.
> > > >
> > > > Thanks
> > > > Ted Malaska
> > > >
> > > > On Mon, Jun 29, 2015 at 4:36 PM, Michael Segel <
> > > michael_segel@hotmail.com>
> > > > wrote:
> > > >
> > > >> No down time?
> > > >>
> > > >> So you want a client to go against a pair of active/active hbase
> > > instances
> > > >> on tied clusters?
> > > >>
> > > >>
> > > >>> On Jun 29, 2015, at 3:20 PM, Ted Malaska <ted.malaska@cloudera.com
> >
> > > >> wrote:
> > > >>>
> > > >>> Hey Michael,
> > > >>>
> > > >>> The use case is simple "No down time use cases" even in the case
of
> > > site
> > > >>> failure.
> > > >>>
> > > >>> Now on this statement
> > > >>> "Why not simply manage each connection/context via a threaded
> child?"
> > > >>>
> > > >>> That is the point, to make that simple, tested, easy, and
> transparent
> > > for
> > > >>> HBase users.
> > > >>>
> > > >>> Ted Malaska
> > > >>>
> > > >>> On Mon, Jun 29, 2015 at 4:11 PM, Michael Segel <
> > > >> michael_segel@hotmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> So if I understand your goal, you want a client who can connect
to
> > one
> > > >> or
> > > >>>> more hbase clusters at the same time…
> > > >>>>
> > > >>>> Ok, so lets walk through the use case and help me understand
a
> > couple
> > > of
> > > >>>> use cases for this…
> > > >>>>
> > > >>>> Why not simply manage each connection/context via a threaded
> child?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> On Jun 29, 2015, at 1:48 PM, Ted Malaska <
> ted.malaska@cloudera.com
> > >
> > > >>>> wrote:
> > > >>>>>
> > > >>>>> Hey Dev List,
> > > >>>>>
> > > >>>>>
> > > >>>>> My name is Ted Malaska, long time lover and user of HBase.
I
> would
> > > like
> > > >>>> to
> > > >>>>> discuss adding in a multi-cluster client into HBase. Here
is the
> > link
> > > >> for
> > > >>>>> the design doc (
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx%20(1).docx
> > > >>>> )
> > > >>>>> but I have pulled some parts into this main e-mail to
give you a
> > high
> > > >>>> level
> > > >>>>> understanding of it's scope.
> > > >>>>>
> > > >>>>>
> > > >>>>> *Goals*
> > > >>>>>
> > > >>>>> The proposed solution is a multi-cluster HBase client
that relies
> > on
> > > >> the
> > > >>>>> existing HBase Replication functionality to provide an
eventual
> > > >>>> consistent
> > > >>>>> solution in cases of primary cluster down time.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > https://github.com/tmalaska/HBase.MCC/blob/master/FailoverImage.png
> > > >>>>>
> > > >>>>>
> > > >>>>> Additional goals are:
> > > >>>>>
> > > >>>>> -
> > > >>>>>
> > > >>>>> Be able to switch between single HBase clusters to Multi-HBase
> > Client
> > > >>>>> with limited or no code changes.  This means using the
> > > >>>> HConnectionManager,
> > > >>>>> Connection, and Table interfaces to hide complexities
from the
> > > >>>> developer
> > > >>>>> (Connection and Table are the new classes for HConnection,
and
> > > >>>>> HTableInterface in HBase version 0.99).
> > > >>>>> -
> > > >>>>>
> > > >>>>> Offer thresholds to allow developers to decide between
degrees of
> > > >>>>> strongly consistent and eventually consistent.
> > > >>>>> - Support N number of linked HBase Clusters
> > > >>>>>
> > > >>>>>
> > > >>>>> *Read-Replicas*
> > > >>>>> Also note this is in alinement with Read-Replicas and
can work
> with
> > > >> that.
> > > >>>>> This client is multi-cluster where Read-Replicas help
us to be
> > multi
> > > >>>> Region
> > > >>>>> Server.
> > > >>>>>
> > > >>>>> *Replication*
> > > >>>>> You will also see in the document that this works with
current
> > > >>>> replication
> > > >>>>> and requires no changes to it.
> > > >>>>>
> > > >>>>> *Only a Client change*
> > > >>>>> You will also see in the doc this is only a new client.
Which
> means
> > > no
> > > >>>>> extra code for the end developer, only addition configs
to set it
> > up.
> > > >>>>>
> > > >>>>> *Github*
> > > >>>>> This is a github project that shows that this works at:
> > > >>>>> https://github.com/tmalaska/HBase.MCC
> > > >>>>> Note this is only a prototype. When adding it to HBase
we will
> use
> > it
> > > >> as
> > > >>>> a
> > > >>>>> starting point but there will be changes.
> > > >>>>>
> > > >>>>> *Initial Results:*
> > > >>>>>
> > > >>>>> Red is where our primary cluster has failed and you will
see from
> > the
> > > >>>>> bottom to graphs that our puts, deletes, and gets are
not
> > > interrupted.
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/AveragePutTimeWithMultiRestartsAndShutDowns.png
> > > >>>>>
> > > >>>>> Thanks
> > > >>>>> Ted Malaska
> > > >>>>
> > > >>>> The opinions expressed here are mine, while they may reflect
a
> > > cognitive
> > > >>>> thought, that is purely accidental.
> > > >>>> Use at your own risk.
> > > >>>> Michael Segel
> > > >>>> michael_segel (AT) hotmail.com
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>
> > > >> The opinions expressed here are mine, while they may reflect a
> > cognitive
> > > >> thought, that is purely accidental.
> > > >> Use at your own risk.
> > > >> Michael Segel
> > > >> michael_segel (AT) hotmail.com
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> > > The opinions expressed here are mine, while they may reflect a
> cognitive
> > > thought, that is purely accidental.
> > > Use at your own risk.
> > > Michael Segel
> > > michael_segel (AT) hotmail.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Sean
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message