hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Malaska <ted.mala...@cloudera.com>
Subject Re: [DISCUSS] Multi-Cluster HBase Client
Date Tue, 30 Jun 2015 11:35:43 GMT
Hey Ramkrishna,

I think your right that are some things that are the same.  The difference
is the problem they are trying to solve and the scope.

In the HBase.MCC design it is only about cluster fail over and keeping 100%
up time in the case of single site failure.  The Cross-site Big Table looks
to have some of that too, but also it is more complex because it has the
requirement of data being local to a single cluster.  So you need to see
all the clusters to get all the data.

May be I'm wrong by they are not solving for the same problem.  Also
because of the HBase.MCC limited scope it is far easier to implement and
maintain.

Now all through I agree that the Cross site Big Table has a valid use
case.  The use case for HBase.MCC is to more set an equal the ground with
Cassandra in the market place.  To allow us to have eventual consistency in
the case of single site failure.  With configs to determine what thresholds
must be pasted before exciting those eventual consistency records.

This will allow HBase to better compete for use cases that involve Near
Real Time Streaming.  This is important because this is the new hot trend
in the market today to move your batch to near real time.  I think HBase is
the best solution out there today for this but for the fake that at site or
region server failure we loss functionality. (Read and Write on site
failure, and write on RS failure)

In the end HBase.MCC's scope is what hopefully should make it exciting.
All we need to do is make a new client and update the connection factory to
give you that multi cluster client when requested through the configs.  No
updates to ZK or HBase core would have to be touched.

Side note: Because of the flexibility in the HBase.MCC configs there is a
way to reach a good majority of the Cross-site BigTable goals with just
HBase.MCC.
Last question: What became of Cross-site BigTable?

Let me know if you find this correct.
Thanks
Ted Malaska

On Tue, Jun 30, 2015 at 12:42 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Hi Ted
>
> I think the idea here is very similar to the Cross-site Big Table project
> that was presented in HBaseCon 2014.
>
> Pls find the slide linke below
> http://www.slideshare.net/HBaseCon/ecosystem-session-3.
> This project also adds a client side wrappers so that the client can
> internally do a failover in case of a cluster going down and automatically
> switching over to the replicated clusters based on the configurations.  Let
> us know if you find this interesting.
>
> Regards
> Ram
>
>
>
> On Tue, Jun 30, 2015 at 4:01 AM, Ted Malaska <ted.malaska@cloudera.com>
> wrote:
>
> > lol I did sorry, this is the right doc
> >
> >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx.pdf
> >
> > On Mon, Jun 29, 2015 at 6:30 PM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> >
> > > I think you may have put up the wrong document? That link goes to
> product
> > > doc.
> > >
> > >
> > > On Mon, Jun 29, 2015 at 3:24 PM, Ted Malaska <ted.malaska@cloudera.com
> >
> > > wrote:
> > >
> > > > Here is the PDF link.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiClusterAndEDH_Latest.docx.pdf
> > > >
> > > > On Mon, Jun 29, 2015 at 6:09 PM, Sean Busbey <busbey@cloudera.com>
> > > wrote:
> > > >
> > > > > Michael,
> > > > >
> > > > > This is the dev list, no sound-bite pitch is needed. We have plenty
> > of
> > > > > features that take time to explain the nuance. Please either engage
> > > with
> > > > > the complexity of the topic or wait for the feature to land and get
> > > > > user-accessible documentation. We all get busy from time to time,
> but
> > > > > that's no reason to push a higher burden on those who are currently
> > > > engaged
> > > > > with a particular effort, especially this early in development.
> > > > >
> > > > > That said, the first paragraph gives a suitable brief motivation
> > > > (slightly
> > > > > rephrased below):
> > > > >
> > > > > > Some applications require response and availability SLAs that
a
> > > single
> > > > > HBase cluster can not meet alone. Particularly for
> > > > > > high percentiles, queries to a single cluster can be delayed
by
> > e.g.
> > > GC
> > > > > pauses, individual server process failure, or maintenance
> > > > > > activity. By providing clients with a transparent multi-cluster
> > > > > configuration option we can avoid these outlier conditions by
> > > > > > mask these failures from applications that are tolerant to weaker
> > > > > consistency guarantees than HBase provides out of the box.
> > > > >
> > > > >
> > > > > Ted,
> > > > >
> > > > > Thanks for writing this up! We'd prefer to keep discussion of it
on
> > the
> > > > > mailing list, so please avoid moving to private webex's.
> > > > >
> > > > > Would you mind if I or one of the other community members converted
> > the
> > > > > design doc to pdf so that it's more accessible?
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jun 29, 2015 at 4:52 PM, Ted Malaska <
> > ted.malaska@cloudera.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Why don't we set up a webex to talk out the detail.  What times
> r u
> > > > open
> > > > > to
> > > > > > talk this week.
> > > > > >
> > > > > > But to answer your questions.  This is for active active and
> active
> > > > > > failover clusters.  There is a primary and n number of fail
overs
> > per
> > > > > > client.  This is for gets and puts.
> > > > > >
> > > > > > There r a number of configs in the doc to define how to failover.
> > > The
> > > > > > options allow a couple different use cases.  There is a lot
of
> > detail
> > > > in
> > > > > > the doc and I just didn't want to put it all in the email.
> > > > > >
> > > > > > But honestly I put a lot of time in the doc.   I would love
to
> know
> > > > what
> > > > > u
> > > > > > think.
> > > > > > On Jun 29, 2015 5:46 PM, "Michael Segel" <
> > michael_segel@hotmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Ted,
> > > > > > >
> > > > > > > If you can’t do a 30 second pitch, then its not worth
the
> effort.
> > > ;-)
> > > > > > >
> > > > > > > Look, when someone says that they want to have a single
client
> > talk
> > > > to
> > > > > > > multiple HBase clusters, that could mean two very different
> > things.
> > > > > > > First, you could mean that you want a single client to
connect
> to
> > > an
> > > > > > > active/active pair of HBase clusters where they replicate
to
> each
> > > > > other.
> > > > > > > (Active / Passive would also be implied, but then you have
the
> > > issue
> > > > of
> > > > > > > when does the passive cluster go active? )
> > > > > > >
> > > > > > > Then you have the issue of someone wanting to talk to multiple
> > > > > different
> > > > > > > clusters so that they can query the data, create local
data
> sets
> > > > which
> > > > > > they
> > > > > > > wish to join, combining data from various sources.
> > > > > > >
> > > > > > > The second is a different problem from the first.
> > > > > > >
> > > > > > > -Mike
> > > > > > >
> > > > > > > > On Jun 29, 2015, at 3:38 PM, Ted Malaska <
> > > ted.malaska@cloudera.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hey Michael,
> > > > > > > >
> > > > > > > > Read the doc please.  It goes through everything at
a low
> > level.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Ted Malaska
> > > > > > > >
> > > > > > > > On Mon, Jun 29, 2015 at 4:36 PM, Michael Segel <
> > > > > > > michael_segel@hotmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> No down time?
> > > > > > > >>
> > > > > > > >> So you want a client to go against a pair of active/active
> > hbase
> > > > > > > instances
> > > > > > > >> on tied clusters?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> On Jun 29, 2015, at 3:20 PM, Ted Malaska <
> > > > ted.malaska@cloudera.com
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >>>
> > > > > > > >>> Hey Michael,
> > > > > > > >>>
> > > > > > > >>> The use case is simple "No down time use cases"
even in the
> > > case
> > > > of
> > > > > > > site
> > > > > > > >>> failure.
> > > > > > > >>>
> > > > > > > >>> Now on this statement
> > > > > > > >>> "Why not simply manage each connection/context
via a
> threaded
> > > > > child?"
> > > > > > > >>>
> > > > > > > >>> That is the point, to make that simple, tested,
easy, and
> > > > > transparent
> > > > > > > for
> > > > > > > >>> HBase users.
> > > > > > > >>>
> > > > > > > >>> Ted Malaska
> > > > > > > >>>
> > > > > > > >>> On Mon, Jun 29, 2015 at 4:11 PM, Michael Segel
<
> > > > > > > >> michael_segel@hotmail.com>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>> So if I understand your goal, you want
a client who can
> > > connect
> > > > to
> > > > > > one
> > > > > > > >> or
> > > > > > > >>>> more hbase clusters at the same time…
> > > > > > > >>>>
> > > > > > > >>>> Ok, so lets walk through the use case
and help me
> > understand a
> > > > > > couple
> > > > > > > of
> > > > > > > >>>> use cases for this…
> > > > > > > >>>>
> > > > > > > >>>> Why not simply manage each connection/context
via a
> threaded
> > > > > child?
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>> On Jun 29, 2015, at 1:48 PM, Ted Malaska
<
> > > > > ted.malaska@cloudera.com
> > > > > > >
> > > > > > > >>>> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>> Hey Dev List,
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> My name is Ted Malaska, long time
lover and user of
> HBase.
> > I
> > > > > would
> > > > > > > like
> > > > > > > >>>> to
> > > > > > > >>>>> discuss adding in a multi-cluster
client into HBase. Here
> > is
> > > > the
> > > > > > link
> > > > > > > >> for
> > > > > > > >>>>> the design doc (
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx%20(1).docx
> > > > > > > >>>> )
> > > > > > > >>>>> but I have pulled some parts into
this main e-mail to
> give
> > > you
> > > > a
> > > > > > high
> > > > > > > >>>> level
> > > > > > > >>>>> understanding of it's scope.
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> *Goals*
> > > > > > > >>>>>
> > > > > > > >>>>> The proposed solution is a multi-cluster
HBase client
> that
> > > > relies
> > > > > > on
> > > > > > > >> the
> > > > > > > >>>>> existing HBase Replication functionality
to provide an
> > > eventual
> > > > > > > >>>> consistent
> > > > > > > >>>>> solution in cases of primary cluster
down time.
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > >
> > https://github.com/tmalaska/HBase.MCC/blob/master/FailoverImage.png
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> Additional goals are:
> > > > > > > >>>>>
> > > > > > > >>>>> -
> > > > > > > >>>>>
> > > > > > > >>>>> Be able to switch between single HBase
clusters to
> > > Multi-HBase
> > > > > > Client
> > > > > > > >>>>> with limited or no code changes. 
This means using the
> > > > > > > >>>> HConnectionManager,
> > > > > > > >>>>> Connection, and Table interfaces to
hide complexities
> from
> > > the
> > > > > > > >>>> developer
> > > > > > > >>>>> (Connection and Table are the new
classes for
> HConnection,
> > > and
> > > > > > > >>>>> HTableInterface in HBase version 0.99).
> > > > > > > >>>>> -
> > > > > > > >>>>>
> > > > > > > >>>>> Offer thresholds to allow developers
to decide between
> > > degrees
> > > > of
> > > > > > > >>>>> strongly consistent and eventually
consistent.
> > > > > > > >>>>> - Support N number of linked HBase
Clusters
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> *Read-Replicas*
> > > > > > > >>>>> Also note this is in alinement with
Read-Replicas and can
> > > work
> > > > > with
> > > > > > > >> that.
> > > > > > > >>>>> This client is multi-cluster where
Read-Replicas help us
> to
> > > be
> > > > > > multi
> > > > > > > >>>> Region
> > > > > > > >>>>> Server.
> > > > > > > >>>>>
> > > > > > > >>>>> *Replication*
> > > > > > > >>>>> You will also see in the document
that this works with
> > > current
> > > > > > > >>>> replication
> > > > > > > >>>>> and requires no changes to it.
> > > > > > > >>>>>
> > > > > > > >>>>> *Only a Client change*
> > > > > > > >>>>> You will also see in the doc this
is only a new client.
> > Which
> > > > > means
> > > > > > > no
> > > > > > > >>>>> extra code for the end developer,
only addition configs
> to
> > > set
> > > > it
> > > > > > up.
> > > > > > > >>>>>
> > > > > > > >>>>> *Github*
> > > > > > > >>>>> This is a github project that shows
that this works at:
> > > > > > > >>>>> https://github.com/tmalaska/HBase.MCC
> > > > > > > >>>>> Note this is only a prototype. When
adding it to HBase we
> > > will
> > > > > use
> > > > > > it
> > > > > > > >> as
> > > > > > > >>>> a
> > > > > > > >>>>> starting point but there will be changes.
> > > > > > > >>>>>
> > > > > > > >>>>> *Initial Results:*
> > > > > > > >>>>>
> > > > > > > >>>>> Red is where our primary cluster has
failed and you will
> > see
> > > > from
> > > > > > the
> > > > > > > >>>>> bottom to graphs that our puts, deletes,
and gets are not
> > > > > > > interrupted.
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/AveragePutTimeWithMultiRestartsAndShutDowns.png
> > > > > > > >>>>>
> > > > > > > >>>>> Thanks
> > > > > > > >>>>> Ted Malaska
> > > > > > > >>>>
> > > > > > > >>>> The opinions expressed here are mine,
while they may
> > reflect a
> > > > > > > cognitive
> > > > > > > >>>> thought, that is purely accidental.
> > > > > > > >>>> Use at your own risk.
> > > > > > > >>>> Michael Segel
> > > > > > > >>>> michael_segel (AT) hotmail.com
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > > >> The opinions expressed here are mine, while they
may
> reflect a
> > > > > > cognitive
> > > > > > > >> thought, that is purely accidental.
> > > > > > > >> Use at your own risk.
> > > > > > > >> Michael Segel
> > > > > > > >> michael_segel (AT) hotmail.com
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > > > The opinions expressed here are mine, while they may reflect
a
> > > > > cognitive
> > > > > > > thought, that is purely accidental.
> > > > > > > Use at your own risk.
> > > > > > > Michael Segel
> > > > > > > michael_segel (AT) hotmail.com
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sean
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message