hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Malaska <ted.mala...@cloudera.com>
Subject Re: [DISCUSS] Multi-Cluster HBase Client
Date Tue, 30 Jun 2015 13:24:26 GMT
Cool Let me know.  If we appeal HBase.MCC correctly maybe we can hit two
birds with one stone.  At least the client part.  It would be nice to have
a client that was configurable and in the core that would support use cases
like this.

On Tue, Jun 30, 2015 at 9:18 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Thanks Ted.
>
> Ya as you said the idea is to solve a bigger use case where there is a
> globally distributed cluster but the data is local to each cluster - ie.
> the data that we write and read is local to that geography or cluster.  The
> cross site Big table will help you to read and write from such a cluster
> transparently just by differentiating them with a cluster id.
>
> But the other subset of the problem that HBase.MCC solves can also be
> achieved because the failover switching during writes/reads happens based
> on the replication setup that is available in that local cluster.
>
> The state of CSBT - I need to know the latest update but it was earlier
> discussed that CSBT cannot be part of the hbase-package but as a stand
> alone tool. I can get the update on that.
>
> Regards
> Ram
>
>
> On Tue, Jun 30, 2015 at 5:05 PM, Ted Malaska <ted.malaska@cloudera.com>
> wrote:
>
> > Hey Ramkrishna,
> >
> > I think your right that are some things that are the same.  The
> difference
> > is the problem they are trying to solve and the scope.
> >
> > In the HBase.MCC design it is only about cluster fail over and keeping
> 100%
> > up time in the case of single site failure.  The Cross-site Big Table
> looks
> > to have some of that too, but also it is more complex because it has the
> > requirement of data being local to a single cluster.  So you need to see
> > all the clusters to get all the data.
> >
> > May be I'm wrong by they are not solving for the same problem.  Also
> > because of the HBase.MCC limited scope it is far easier to implement and
> > maintain.
> >
> > Now all through I agree that the Cross site Big Table has a valid use
> > case.  The use case for HBase.MCC is to more set an equal the ground with
> > Cassandra in the market place.  To allow us to have eventual consistency
> in
> > the case of single site failure.  With configs to determine what
> thresholds
> > must be pasted before exciting those eventual consistency records.
> >
> > This will allow HBase to better compete for use cases that involve Near
> > Real Time Streaming.  This is important because this is the new hot trend
> > in the market today to move your batch to near real time.  I think HBase
> is
> > the best solution out there today for this but for the fake that at site
> or
> > region server failure we loss functionality. (Read and Write on site
> > failure, and write on RS failure)
> >
> > In the end HBase.MCC's scope is what hopefully should make it exciting.
> > All we need to do is make a new client and update the connection factory
> to
> > give you that multi cluster client when requested through the configs.
> No
> > updates to ZK or HBase core would have to be touched.
> >
> > Side note: Because of the flexibility in the HBase.MCC configs there is a
> > way to reach a good majority of the Cross-site BigTable goals with just
> > HBase.MCC.
> > Last question: What became of Cross-site BigTable?
> >
> > Let me know if you find this correct.
> > Thanks
> > Ted Malaska
> >
> > On Tue, Jun 30, 2015 at 12:42 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > Hi Ted
> > >
> > > I think the idea here is very similar to the Cross-site Big Table
> project
> > > that was presented in HBaseCon 2014.
> > >
> > > Pls find the slide linke below
> > > http://www.slideshare.net/HBaseCon/ecosystem-session-3.
> > > This project also adds a client side wrappers so that the client can
> > > internally do a failover in case of a cluster going down and
> > automatically
> > > switching over to the replicated clusters based on the configurations.
> > Let
> > > us know if you find this interesting.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > >
> > > On Tue, Jun 30, 2015 at 4:01 AM, Ted Malaska <ted.malaska@cloudera.com
> >
> > > wrote:
> > >
> > > > lol I did sorry, this is the right doc
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx.pdf
> > > >
> > > > On Mon, Jun 29, 2015 at 6:30 PM, Andrew Purtell <apurtell@apache.org
> >
> > > > wrote:
> > > >
> > > > > I think you may have put up the wrong document? That link goes to
> > > product
> > > > > doc.
> > > > >
> > > > >
> > > > > On Mon, Jun 29, 2015 at 3:24 PM, Ted Malaska <
> > ted.malaska@cloudera.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Here is the PDF link.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiClusterAndEDH_Latest.docx.pdf
> > > > > >
> > > > > > On Mon, Jun 29, 2015 at 6:09 PM, Sean Busbey <
> busbey@cloudera.com>
> > > > > wrote:
> > > > > >
> > > > > > > Michael,
> > > > > > >
> > > > > > > This is the dev list, no sound-bite pitch is needed. We
have
> > plenty
> > > > of
> > > > > > > features that take time to explain the nuance. Please either
> > engage
> > > > > with
> > > > > > > the complexity of the topic or wait for the feature to
land and
> > get
> > > > > > > user-accessible documentation. We all get busy from time
to
> time,
> > > but
> > > > > > > that's no reason to push a higher burden on those who are
> > currently
> > > > > > engaged
> > > > > > > with a particular effort, especially this early in development.
> > > > > > >
> > > > > > > That said, the first paragraph gives a suitable brief
> motivation
> > > > > > (slightly
> > > > > > > rephrased below):
> > > > > > >
> > > > > > > > Some applications require response and availability
SLAs
> that a
> > > > > single
> > > > > > > HBase cluster can not meet alone. Particularly for
> > > > > > > > high percentiles, queries to a single cluster can
be delayed
> by
> > > > e.g.
> > > > > GC
> > > > > > > pauses, individual server process failure, or maintenance
> > > > > > > > activity. By providing clients with a transparent
> multi-cluster
> > > > > > > configuration option we can avoid these outlier conditions
by
> > > > > > > > mask these failures from applications that are tolerant
to
> > weaker
> > > > > > > consistency guarantees than HBase provides out of the box.
> > > > > > >
> > > > > > >
> > > > > > > Ted,
> > > > > > >
> > > > > > > Thanks for writing this up! We'd prefer to keep discussion
of
> it
> > on
> > > > the
> > > > > > > mailing list, so please avoid moving to private webex's.
> > > > > > >
> > > > > > > Would you mind if I or one of the other community members
> > converted
> > > > the
> > > > > > > design doc to pdf so that it's more accessible?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jun 29, 2015 at 4:52 PM, Ted Malaska <
> > > > ted.malaska@cloudera.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Why don't we set up a webex to talk out the detail.
 What
> times
> > > r u
> > > > > > open
> > > > > > > to
> > > > > > > > talk this week.
> > > > > > > >
> > > > > > > > But to answer your questions.  This is for active
active and
> > > active
> > > > > > > > failover clusters.  There is a primary and n number
of fail
> > overs
> > > > per
> > > > > > > > client.  This is for gets and puts.
> > > > > > > >
> > > > > > > > There r a number of configs in the doc to define how
to
> > failover.
> > > > > The
> > > > > > > > options allow a couple different use cases.  There
is a lot
> of
> > > > detail
> > > > > > in
> > > > > > > > the doc and I just didn't want to put it all in the
email.
> > > > > > > >
> > > > > > > > But honestly I put a lot of time in the doc.   I would
love
> to
> > > know
> > > > > > what
> > > > > > > u
> > > > > > > > think.
> > > > > > > > On Jun 29, 2015 5:46 PM, "Michael Segel" <
> > > > michael_segel@hotmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Ted,
> > > > > > > > >
> > > > > > > > > If you can’t do a 30 second pitch, then its
not worth the
> > > effort.
> > > > > ;-)
> > > > > > > > >
> > > > > > > > > Look, when someone says that they want to have
a single
> > client
> > > > talk
> > > > > > to
> > > > > > > > > multiple HBase clusters, that could mean two
very different
> > > > things.
> > > > > > > > > First, you could mean that you want a single
client to
> > connect
> > > to
> > > > > an
> > > > > > > > > active/active pair of HBase clusters where they
replicate
> to
> > > each
> > > > > > > other.
> > > > > > > > > (Active / Passive would also be implied, but
then you have
> > the
> > > > > issue
> > > > > > of
> > > > > > > > > when does the passive cluster go active? )
> > > > > > > > >
> > > > > > > > > Then you have the issue of someone wanting to
talk to
> > multiple
> > > > > > > different
> > > > > > > > > clusters so that they can query the data, create
local data
> > > sets
> > > > > > which
> > > > > > > > they
> > > > > > > > > wish to join, combining data from various sources.
> > > > > > > > >
> > > > > > > > > The second is a different problem from the first.
> > > > > > > > >
> > > > > > > > > -Mike
> > > > > > > > >
> > > > > > > > > > On Jun 29, 2015, at 3:38 PM, Ted Malaska
<
> > > > > ted.malaska@cloudera.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hey Michael,
> > > > > > > > > >
> > > > > > > > > > Read the doc please.  It goes through everything
at a low
> > > > level.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Ted Malaska
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 29, 2015 at 4:36 PM, Michael
Segel <
> > > > > > > > > michael_segel@hotmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> No down time?
> > > > > > > > > >>
> > > > > > > > > >> So you want a client to go against a
pair of
> active/active
> > > > hbase
> > > > > > > > > instances
> > > > > > > > > >> on tied clusters?
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>> On Jun 29, 2015, at 3:20 PM, Ted
Malaska <
> > > > > > ted.malaska@cloudera.com
> > > > > > > >
> > > > > > > > > >> wrote:
> > > > > > > > > >>>
> > > > > > > > > >>> Hey Michael,
> > > > > > > > > >>>
> > > > > > > > > >>> The use case is simple "No down
time use cases" even in
> > the
> > > > > case
> > > > > > of
> > > > > > > > > site
> > > > > > > > > >>> failure.
> > > > > > > > > >>>
> > > > > > > > > >>> Now on this statement
> > > > > > > > > >>> "Why not simply manage each connection/context
via a
> > > threaded
> > > > > > > child?"
> > > > > > > > > >>>
> > > > > > > > > >>> That is the point, to make that
simple, tested, easy,
> and
> > > > > > > transparent
> > > > > > > > > for
> > > > > > > > > >>> HBase users.
> > > > > > > > > >>>
> > > > > > > > > >>> Ted Malaska
> > > > > > > > > >>>
> > > > > > > > > >>> On Mon, Jun 29, 2015 at 4:11 PM,
Michael Segel <
> > > > > > > > > >> michael_segel@hotmail.com>
> > > > > > > > > >>> wrote:
> > > > > > > > > >>>
> > > > > > > > > >>>> So if I understand your goal,
you want a client who
> can
> > > > > connect
> > > > > > to
> > > > > > > > one
> > > > > > > > > >> or
> > > > > > > > > >>>> more hbase clusters at the same
time…
> > > > > > > > > >>>>
> > > > > > > > > >>>> Ok, so lets walk through the
use case and help me
> > > > understand a
> > > > > > > > couple
> > > > > > > > > of
> > > > > > > > > >>>> use cases for this…
> > > > > > > > > >>>>
> > > > > > > > > >>>> Why not simply manage each connection/context
via a
> > > threaded
> > > > > > > child?
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>> On Jun 29, 2015, at 1:48
PM, Ted Malaska <
> > > > > > > ted.malaska@cloudera.com
> > > > > > > > >
> > > > > > > > > >>>> wrote:
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Hey Dev List,
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> My name is Ted Malaska,
long time lover and user of
> > > HBase.
> > > > I
> > > > > > > would
> > > > > > > > > like
> > > > > > > > > >>>> to
> > > > > > > > > >>>>> discuss adding in a multi-cluster
client into HBase.
> > Here
> > > > is
> > > > > > the
> > > > > > > > link
> > > > > > > > > >> for
> > > > > > > > > >>>>> the design doc (
> > > > > > > > > >>>>>
> > > > > > > > > >>>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx%20(1).docx
> > > > > > > > > >>>> )
> > > > > > > > > >>>>> but I have pulled some parts
into this main e-mail to
> > > give
> > > > > you
> > > > > > a
> > > > > > > > high
> > > > > > > > > >>>> level
> > > > > > > > > >>>>> understanding of it's scope.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> *Goals*
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> The proposed solution is
a multi-cluster HBase client
> > > that
> > > > > > relies
> > > > > > > > on
> > > > > > > > > >> the
> > > > > > > > > >>>>> existing HBase Replication
functionality to provide
> an
> > > > > eventual
> > > > > > > > > >>>> consistent
> > > > > > > > > >>>>> solution in cases of primary
cluster down time.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>
> > > > > > > >
> > > > https://github.com/tmalaska/HBase.MCC/blob/master/FailoverImage.png
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Additional goals are:
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> -
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Be able to switch between
single HBase clusters to
> > > > > Multi-HBase
> > > > > > > > Client
> > > > > > > > > >>>>> with limited or no code
changes.  This means using
> the
> > > > > > > > > >>>> HConnectionManager,
> > > > > > > > > >>>>> Connection, and Table interfaces
to hide complexities
> > > from
> > > > > the
> > > > > > > > > >>>> developer
> > > > > > > > > >>>>> (Connection and Table are
the new classes for
> > > HConnection,
> > > > > and
> > > > > > > > > >>>>> HTableInterface in HBase
version 0.99).
> > > > > > > > > >>>>> -
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Offer thresholds to allow
developers to decide
> between
> > > > > degrees
> > > > > > of
> > > > > > > > > >>>>> strongly consistent and
eventually consistent.
> > > > > > > > > >>>>> - Support N number of linked
HBase Clusters
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> *Read-Replicas*
> > > > > > > > > >>>>> Also note this is in alinement
with Read-Replicas and
> > can
> > > > > work
> > > > > > > with
> > > > > > > > > >> that.
> > > > > > > > > >>>>> This client is multi-cluster
where Read-Replicas help
> > us
> > > to
> > > > > be
> > > > > > > > multi
> > > > > > > > > >>>> Region
> > > > > > > > > >>>>> Server.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> *Replication*
> > > > > > > > > >>>>> You will also see in the
document that this works
> with
> > > > > current
> > > > > > > > > >>>> replication
> > > > > > > > > >>>>> and requires no changes
to it.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> *Only a Client change*
> > > > > > > > > >>>>> You will also see in the
doc this is only a new
> client.
> > > > Which
> > > > > > > means
> > > > > > > > > no
> > > > > > > > > >>>>> extra code for the end developer,
only addition
> configs
> > > to
> > > > > set
> > > > > > it
> > > > > > > > up.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> *Github*
> > > > > > > > > >>>>> This is a github project
that shows that this works
> at:
> > > > > > > > > >>>>> https://github.com/tmalaska/HBase.MCC
> > > > > > > > > >>>>> Note this is only a prototype.
When adding it to
> HBase
> > we
> > > > > will
> > > > > > > use
> > > > > > > > it
> > > > > > > > > >> as
> > > > > > > > > >>>> a
> > > > > > > > > >>>>> starting point but there
will be changes.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> *Initial Results:*
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Red is where our primary
cluster has failed and you
> > will
> > > > see
> > > > > > from
> > > > > > > > the
> > > > > > > > > >>>>> bottom to graphs that our
puts, deletes, and gets are
> > not
> > > > > > > > > interrupted.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/AveragePutTimeWithMultiRestartsAndShutDowns.png
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Thanks
> > > > > > > > > >>>>> Ted Malaska
> > > > > > > > > >>>>
> > > > > > > > > >>>> The opinions expressed here
are mine, while they may
> > > > reflect a
> > > > > > > > > cognitive
> > > > > > > > > >>>> thought, that is purely accidental.
> > > > > > > > > >>>> Use at your own risk.
> > > > > > > > > >>>> Michael Segel
> > > > > > > > > >>>> michael_segel (AT) hotmail.com
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>
> > > > > > > > > >> The opinions expressed here are mine,
while they may
> > > reflect a
> > > > > > > > cognitive
> > > > > > > > > >> thought, that is purely accidental.
> > > > > > > > > >> Use at your own risk.
> > > > > > > > > >> Michael Segel
> > > > > > > > > >> michael_segel (AT) hotmail.com
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > > > The opinions expressed here are mine, while they
may
> reflect
> > a
> > > > > > > cognitive
> > > > > > > > > thought, that is purely accidental.
> > > > > > > > > Use at your own risk.
> > > > > > > > > Michael Segel
> > > > > > > > > michael_segel (AT) hotmail.com
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sean
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >    - Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message