hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: [DISCUSS] Multi-Cluster HBase Client
Date Mon, 29 Jun 2015 22:30:14 GMT
I think you may have put up the wrong document? That link goes to product
doc.


On Mon, Jun 29, 2015 at 3:24 PM, Ted Malaska <ted.malaska@cloudera.com>
wrote:

> Here is the PDF link.
>
>
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiClusterAndEDH_Latest.docx.pdf
>
> On Mon, Jun 29, 2015 at 6:09 PM, Sean Busbey <busbey@cloudera.com> wrote:
>
> > Michael,
> >
> > This is the dev list, no sound-bite pitch is needed. We have plenty of
> > features that take time to explain the nuance. Please either engage with
> > the complexity of the topic or wait for the feature to land and get
> > user-accessible documentation. We all get busy from time to time, but
> > that's no reason to push a higher burden on those who are currently
> engaged
> > with a particular effort, especially this early in development.
> >
> > That said, the first paragraph gives a suitable brief motivation
> (slightly
> > rephrased below):
> >
> > > Some applications require response and availability SLAs that a single
> > HBase cluster can not meet alone. Particularly for
> > > high percentiles, queries to a single cluster can be delayed by e.g. GC
> > pauses, individual server process failure, or maintenance
> > > activity. By providing clients with a transparent multi-cluster
> > configuration option we can avoid these outlier conditions by
> > > mask these failures from applications that are tolerant to weaker
> > consistency guarantees than HBase provides out of the box.
> >
> >
> > Ted,
> >
> > Thanks for writing this up! We'd prefer to keep discussion of it on the
> > mailing list, so please avoid moving to private webex's.
> >
> > Would you mind if I or one of the other community members converted the
> > design doc to pdf so that it's more accessible?
> >
> >
> >
> > On Mon, Jun 29, 2015 at 4:52 PM, Ted Malaska <ted.malaska@cloudera.com>
> > wrote:
> >
> > > Why don't we set up a webex to talk out the detail.  What times r u
> open
> > to
> > > talk this week.
> > >
> > > But to answer your questions.  This is for active active and active
> > > failover clusters.  There is a primary and n number of fail overs per
> > > client.  This is for gets and puts.
> > >
> > > There r a number of configs in the doc to define how to failover.  The
> > > options allow a couple different use cases.  There is a lot of detail
> in
> > > the doc and I just didn't want to put it all in the email.
> > >
> > > But honestly I put a lot of time in the doc.   I would love to know
> what
> > u
> > > think.
> > > On Jun 29, 2015 5:46 PM, "Michael Segel" <michael_segel@hotmail.com>
> > > wrote:
> > >
> > > > Ted,
> > > >
> > > > If you can’t do a 30 second pitch, then its not worth the effort. ;-)
> > > >
> > > > Look, when someone says that they want to have a single client talk
> to
> > > > multiple HBase clusters, that could mean two very different things.
> > > > First, you could mean that you want a single client to connect to an
> > > > active/active pair of HBase clusters where they replicate to each
> > other.
> > > > (Active / Passive would also be implied, but then you have the issue
> of
> > > > when does the passive cluster go active? )
> > > >
> > > > Then you have the issue of someone wanting to talk to multiple
> > different
> > > > clusters so that they can query the data, create local data sets
> which
> > > they
> > > > wish to join, combining data from various sources.
> > > >
> > > > The second is a different problem from the first.
> > > >
> > > > -Mike
> > > >
> > > > > On Jun 29, 2015, at 3:38 PM, Ted Malaska <ted.malaska@cloudera.com
> >
> > > > wrote:
> > > > >
> > > > > Hey Michael,
> > > > >
> > > > > Read the doc please.  It goes through everything at a low level.
> > > > >
> > > > > Thanks
> > > > > Ted Malaska
> > > > >
> > > > > On Mon, Jun 29, 2015 at 4:36 PM, Michael Segel <
> > > > michael_segel@hotmail.com>
> > > > > wrote:
> > > > >
> > > > >> No down time?
> > > > >>
> > > > >> So you want a client to go against a pair of active/active hbase
> > > > instances
> > > > >> on tied clusters?
> > > > >>
> > > > >>
> > > > >>> On Jun 29, 2015, at 3:20 PM, Ted Malaska <
> ted.malaska@cloudera.com
> > >
> > > > >> wrote:
> > > > >>>
> > > > >>> Hey Michael,
> > > > >>>
> > > > >>> The use case is simple "No down time use cases" even in the
case
> of
> > > > site
> > > > >>> failure.
> > > > >>>
> > > > >>> Now on this statement
> > > > >>> "Why not simply manage each connection/context via a threaded
> > child?"
> > > > >>>
> > > > >>> That is the point, to make that simple, tested, easy, and
> > transparent
> > > > for
> > > > >>> HBase users.
> > > > >>>
> > > > >>> Ted Malaska
> > > > >>>
> > > > >>> On Mon, Jun 29, 2015 at 4:11 PM, Michael Segel <
> > > > >> michael_segel@hotmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> So if I understand your goal, you want a client who can
connect
> to
> > > one
> > > > >> or
> > > > >>>> more hbase clusters at the same time…
> > > > >>>>
> > > > >>>> Ok, so lets walk through the use case and help me understand
a
> > > couple
> > > > of
> > > > >>>> use cases for this…
> > > > >>>>
> > > > >>>> Why not simply manage each connection/context via a threaded
> > child?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>> On Jun 29, 2015, at 1:48 PM, Ted Malaska <
> > ted.malaska@cloudera.com
> > > >
> > > > >>>> wrote:
> > > > >>>>>
> > > > >>>>> Hey Dev List,
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> My name is Ted Malaska, long time lover and user
of HBase. I
> > would
> > > > like
> > > > >>>> to
> > > > >>>>> discuss adding in a multi-cluster client into HBase.
Here is
> the
> > > link
> > > > >> for
> > > > >>>>> the design doc (
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx%20(1).docx
> > > > >>>> )
> > > > >>>>> but I have pulled some parts into this main e-mail
to give you
> a
> > > high
> > > > >>>> level
> > > > >>>>> understanding of it's scope.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> *Goals*
> > > > >>>>>
> > > > >>>>> The proposed solution is a multi-cluster HBase client
that
> relies
> > > on
> > > > >> the
> > > > >>>>> existing HBase Replication functionality to provide
an eventual
> > > > >>>> consistent
> > > > >>>>> solution in cases of primary cluster down time.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > https://github.com/tmalaska/HBase.MCC/blob/master/FailoverImage.png
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Additional goals are:
> > > > >>>>>
> > > > >>>>> -
> > > > >>>>>
> > > > >>>>> Be able to switch between single HBase clusters to
Multi-HBase
> > > Client
> > > > >>>>> with limited or no code changes.  This means using
the
> > > > >>>> HConnectionManager,
> > > > >>>>> Connection, and Table interfaces to hide complexities
from the
> > > > >>>> developer
> > > > >>>>> (Connection and Table are the new classes for HConnection,
and
> > > > >>>>> HTableInterface in HBase version 0.99).
> > > > >>>>> -
> > > > >>>>>
> > > > >>>>> Offer thresholds to allow developers to decide between
degrees
> of
> > > > >>>>> strongly consistent and eventually consistent.
> > > > >>>>> - Support N number of linked HBase Clusters
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> *Read-Replicas*
> > > > >>>>> Also note this is in alinement with Read-Replicas
and can work
> > with
> > > > >> that.
> > > > >>>>> This client is multi-cluster where Read-Replicas
help us to be
> > > multi
> > > > >>>> Region
> > > > >>>>> Server.
> > > > >>>>>
> > > > >>>>> *Replication*
> > > > >>>>> You will also see in the document that this works
with current
> > > > >>>> replication
> > > > >>>>> and requires no changes to it.
> > > > >>>>>
> > > > >>>>> *Only a Client change*
> > > > >>>>> You will also see in the doc this is only a new client.
Which
> > means
> > > > no
> > > > >>>>> extra code for the end developer, only addition configs
to set
> it
> > > up.
> > > > >>>>>
> > > > >>>>> *Github*
> > > > >>>>> This is a github project that shows that this works
at:
> > > > >>>>> https://github.com/tmalaska/HBase.MCC
> > > > >>>>> Note this is only a prototype. When adding it to
HBase we will
> > use
> > > it
> > > > >> as
> > > > >>>> a
> > > > >>>>> starting point but there will be changes.
> > > > >>>>>
> > > > >>>>> *Initial Results:*
> > > > >>>>>
> > > > >>>>> Red is where our primary cluster has failed and you
will see
> from
> > > the
> > > > >>>>> bottom to graphs that our puts, deletes, and gets
are not
> > > > interrupted.
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/AveragePutTimeWithMultiRestartsAndShutDowns.png
> > > > >>>>>
> > > > >>>>> Thanks
> > > > >>>>> Ted Malaska
> > > > >>>>
> > > > >>>> The opinions expressed here are mine, while they may
reflect a
> > > > cognitive
> > > > >>>> thought, that is purely accidental.
> > > > >>>> Use at your own risk.
> > > > >>>> Michael Segel
> > > > >>>> michael_segel (AT) hotmail.com
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>
> > > > >> The opinions expressed here are mine, while they may reflect
a
> > > cognitive
> > > > >> thought, that is purely accidental.
> > > > >> Use at your own risk.
> > > > >> Michael Segel
> > > > >> michael_segel (AT) hotmail.com
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > > > The opinions expressed here are mine, while they may reflect a
> > cognitive
> > > > thought, that is purely accidental.
> > > > Use at your own risk.
> > > > Michael Segel
> > > > michael_segel (AT) hotmail.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Sean
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message