hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: [DISCUSS] Multi-Cluster HBase Client
Date Tue, 30 Jun 2015 13:18:54 GMT
Thanks Ted.

Ya as you said the idea is to solve a bigger use case where there is a
globally distributed cluster but the data is local to each cluster - ie.
the data that we write and read is local to that geography or cluster.  The
cross site Big table will help you to read and write from such a cluster
transparently just by differentiating them with a cluster id.

But the other subset of the problem that HBase.MCC solves can also be
achieved because the failover switching during writes/reads happens based
on the replication setup that is available in that local cluster.

The state of CSBT - I need to know the latest update but it was earlier
discussed that CSBT cannot be part of the hbase-package but as a stand
alone tool. I can get the update on that.

Regards
Ram


On Tue, Jun 30, 2015 at 5:05 PM, Ted Malaska <ted.malaska@cloudera.com>
wrote:

> Hey Ramkrishna,
>
> I think your right that are some things that are the same.  The difference
> is the problem they are trying to solve and the scope.
>
> In the HBase.MCC design it is only about cluster fail over and keeping 100%
> up time in the case of single site failure.  The Cross-site Big Table looks
> to have some of that too, but also it is more complex because it has the
> requirement of data being local to a single cluster.  So you need to see
> all the clusters to get all the data.
>
> May be I'm wrong by they are not solving for the same problem.  Also
> because of the HBase.MCC limited scope it is far easier to implement and
> maintain.
>
> Now all through I agree that the Cross site Big Table has a valid use
> case.  The use case for HBase.MCC is to more set an equal the ground with
> Cassandra in the market place.  To allow us to have eventual consistency in
> the case of single site failure.  With configs to determine what thresholds
> must be pasted before exciting those eventual consistency records.
>
> This will allow HBase to better compete for use cases that involve Near
> Real Time Streaming.  This is important because this is the new hot trend
> in the market today to move your batch to near real time.  I think HBase is
> the best solution out there today for this but for the fake that at site or
> region server failure we loss functionality. (Read and Write on site
> failure, and write on RS failure)
>
> In the end HBase.MCC's scope is what hopefully should make it exciting.
> All we need to do is make a new client and update the connection factory to
> give you that multi cluster client when requested through the configs.  No
> updates to ZK or HBase core would have to be touched.
>
> Side note: Because of the flexibility in the HBase.MCC configs there is a
> way to reach a good majority of the Cross-site BigTable goals with just
> HBase.MCC.
> Last question: What became of Cross-site BigTable?
>
> Let me know if you find this correct.
> Thanks
> Ted Malaska
>
> On Tue, Jun 30, 2015 at 12:42 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > Hi Ted
> >
> > I think the idea here is very similar to the Cross-site Big Table project
> > that was presented in HBaseCon 2014.
> >
> > Pls find the slide linke below
> > http://www.slideshare.net/HBaseCon/ecosystem-session-3.
> > This project also adds a client side wrappers so that the client can
> > internally do a failover in case of a cluster going down and
> automatically
> > switching over to the replicated clusters based on the configurations.
> Let
> > us know if you find this interesting.
> >
> > Regards
> > Ram
> >
> >
> >
> > On Tue, Jun 30, 2015 at 4:01 AM, Ted Malaska <ted.malaska@cloudera.com>
> > wrote:
> >
> > > lol I did sorry, this is the right doc
> > >
> > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx.pdf
> > >
> > > On Mon, Jun 29, 2015 at 6:30 PM, Andrew Purtell <apurtell@apache.org>
> > > wrote:
> > >
> > > > I think you may have put up the wrong document? That link goes to
> > product
> > > > doc.
> > > >
> > > >
> > > > On Mon, Jun 29, 2015 at 3:24 PM, Ted Malaska <
> ted.malaska@cloudera.com
> > >
> > > > wrote:
> > > >
> > > > > Here is the PDF link.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiClusterAndEDH_Latest.docx.pdf
> > > > >
> > > > > On Mon, Jun 29, 2015 at 6:09 PM, Sean Busbey <busbey@cloudera.com>
> > > > wrote:
> > > > >
> > > > > > Michael,
> > > > > >
> > > > > > This is the dev list, no sound-bite pitch is needed. We have
> plenty
> > > of
> > > > > > features that take time to explain the nuance. Please either
> engage
> > > > with
> > > > > > the complexity of the topic or wait for the feature to land
and
> get
> > > > > > user-accessible documentation. We all get busy from time to
time,
> > but
> > > > > > that's no reason to push a higher burden on those who are
> currently
> > > > > engaged
> > > > > > with a particular effort, especially this early in development.
> > > > > >
> > > > > > That said, the first paragraph gives a suitable brief motivation
> > > > > (slightly
> > > > > > rephrased below):
> > > > > >
> > > > > > > Some applications require response and availability SLAs
that a
> > > > single
> > > > > > HBase cluster can not meet alone. Particularly for
> > > > > > > high percentiles, queries to a single cluster can be delayed
by
> > > e.g.
> > > > GC
> > > > > > pauses, individual server process failure, or maintenance
> > > > > > > activity. By providing clients with a transparent multi-cluster
> > > > > > configuration option we can avoid these outlier conditions by
> > > > > > > mask these failures from applications that are tolerant
to
> weaker
> > > > > > consistency guarantees than HBase provides out of the box.
> > > > > >
> > > > > >
> > > > > > Ted,
> > > > > >
> > > > > > Thanks for writing this up! We'd prefer to keep discussion of
it
> on
> > > the
> > > > > > mailing list, so please avoid moving to private webex's.
> > > > > >
> > > > > > Would you mind if I or one of the other community members
> converted
> > > the
> > > > > > design doc to pdf so that it's more accessible?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 29, 2015 at 4:52 PM, Ted Malaska <
> > > ted.malaska@cloudera.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Why don't we set up a webex to talk out the detail.  What
times
> > r u
> > > > > open
> > > > > > to
> > > > > > > talk this week.
> > > > > > >
> > > > > > > But to answer your questions.  This is for active active
and
> > active
> > > > > > > failover clusters.  There is a primary and n number of
fail
> overs
> > > per
> > > > > > > client.  This is for gets and puts.
> > > > > > >
> > > > > > > There r a number of configs in the doc to define how to
> failover.
> > > > The
> > > > > > > options allow a couple different use cases.  There is a
lot of
> > > detail
> > > > > in
> > > > > > > the doc and I just didn't want to put it all in the email.
> > > > > > >
> > > > > > > But honestly I put a lot of time in the doc.   I would
love to
> > know
> > > > > what
> > > > > > u
> > > > > > > think.
> > > > > > > On Jun 29, 2015 5:46 PM, "Michael Segel" <
> > > michael_segel@hotmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Ted,
> > > > > > > >
> > > > > > > > If you can’t do a 30 second pitch, then its not
worth the
> > effort.
> > > > ;-)
> > > > > > > >
> > > > > > > > Look, when someone says that they want to have a single
> client
> > > talk
> > > > > to
> > > > > > > > multiple HBase clusters, that could mean two very
different
> > > things.
> > > > > > > > First, you could mean that you want a single client
to
> connect
> > to
> > > > an
> > > > > > > > active/active pair of HBase clusters where they replicate
to
> > each
> > > > > > other.
> > > > > > > > (Active / Passive would also be implied, but then
you have
> the
> > > > issue
> > > > > of
> > > > > > > > when does the passive cluster go active? )
> > > > > > > >
> > > > > > > > Then you have the issue of someone wanting to talk
to
> multiple
> > > > > > different
> > > > > > > > clusters so that they can query the data, create local
data
> > sets
> > > > > which
> > > > > > > they
> > > > > > > > wish to join, combining data from various sources.
> > > > > > > >
> > > > > > > > The second is a different problem from the first.
> > > > > > > >
> > > > > > > > -Mike
> > > > > > > >
> > > > > > > > > On Jun 29, 2015, at 3:38 PM, Ted Malaska <
> > > > ted.malaska@cloudera.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hey Michael,
> > > > > > > > >
> > > > > > > > > Read the doc please.  It goes through everything
at a low
> > > level.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Ted Malaska
> > > > > > > > >
> > > > > > > > > On Mon, Jun 29, 2015 at 4:36 PM, Michael Segel
<
> > > > > > > > michael_segel@hotmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> No down time?
> > > > > > > > >>
> > > > > > > > >> So you want a client to go against a pair
of active/active
> > > hbase
> > > > > > > > instances
> > > > > > > > >> on tied clusters?
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> On Jun 29, 2015, at 3:20 PM, Ted Malaska
<
> > > > > ted.malaska@cloudera.com
> > > > > > >
> > > > > > > > >> wrote:
> > > > > > > > >>>
> > > > > > > > >>> Hey Michael,
> > > > > > > > >>>
> > > > > > > > >>> The use case is simple "No down time
use cases" even in
> the
> > > > case
> > > > > of
> > > > > > > > site
> > > > > > > > >>> failure.
> > > > > > > > >>>
> > > > > > > > >>> Now on this statement
> > > > > > > > >>> "Why not simply manage each connection/context
via a
> > threaded
> > > > > > child?"
> > > > > > > > >>>
> > > > > > > > >>> That is the point, to make that simple,
tested, easy, and
> > > > > > transparent
> > > > > > > > for
> > > > > > > > >>> HBase users.
> > > > > > > > >>>
> > > > > > > > >>> Ted Malaska
> > > > > > > > >>>
> > > > > > > > >>> On Mon, Jun 29, 2015 at 4:11 PM, Michael
Segel <
> > > > > > > > >> michael_segel@hotmail.com>
> > > > > > > > >>> wrote:
> > > > > > > > >>>
> > > > > > > > >>>> So if I understand your goal, you
want a client who can
> > > > connect
> > > > > to
> > > > > > > one
> > > > > > > > >> or
> > > > > > > > >>>> more hbase clusters at the same time…
> > > > > > > > >>>>
> > > > > > > > >>>> Ok, so lets walk through the use
case and help me
> > > understand a
> > > > > > > couple
> > > > > > > > of
> > > > > > > > >>>> use cases for this…
> > > > > > > > >>>>
> > > > > > > > >>>> Why not simply manage each connection/context
via a
> > threaded
> > > > > > child?
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>>> On Jun 29, 2015, at 1:48 PM,
Ted Malaska <
> > > > > > ted.malaska@cloudera.com
> > > > > > > >
> > > > > > > > >>>> wrote:
> > > > > > > > >>>>>
> > > > > > > > >>>>> Hey Dev List,
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>> My name is Ted Malaska, long
time lover and user of
> > HBase.
> > > I
> > > > > > would
> > > > > > > > like
> > > > > > > > >>>> to
> > > > > > > > >>>>> discuss adding in a multi-cluster
client into HBase.
> Here
> > > is
> > > > > the
> > > > > > > link
> > > > > > > > >> for
> > > > > > > > >>>>> the design doc (
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx%20(1).docx
> > > > > > > > >>>> )
> > > > > > > > >>>>> but I have pulled some parts
into this main e-mail to
> > give
> > > > you
> > > > > a
> > > > > > > high
> > > > > > > > >>>> level
> > > > > > > > >>>>> understanding of it's scope.
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>> *Goals*
> > > > > > > > >>>>>
> > > > > > > > >>>>> The proposed solution is a multi-cluster
HBase client
> > that
> > > > > relies
> > > > > > > on
> > > > > > > > >> the
> > > > > > > > >>>>> existing HBase Replication functionality
to provide an
> > > > eventual
> > > > > > > > >>>> consistent
> > > > > > > > >>>>> solution in cases of primary
cluster down time.
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > >
> > > https://github.com/tmalaska/HBase.MCC/blob/master/FailoverImage.png
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>> Additional goals are:
> > > > > > > > >>>>>
> > > > > > > > >>>>> -
> > > > > > > > >>>>>
> > > > > > > > >>>>> Be able to switch between single
HBase clusters to
> > > > Multi-HBase
> > > > > > > Client
> > > > > > > > >>>>> with limited or no code changes.
 This means using the
> > > > > > > > >>>> HConnectionManager,
> > > > > > > > >>>>> Connection, and Table interfaces
to hide complexities
> > from
> > > > the
> > > > > > > > >>>> developer
> > > > > > > > >>>>> (Connection and Table are the
new classes for
> > HConnection,
> > > > and
> > > > > > > > >>>>> HTableInterface in HBase version
0.99).
> > > > > > > > >>>>> -
> > > > > > > > >>>>>
> > > > > > > > >>>>> Offer thresholds to allow developers
to decide between
> > > > degrees
> > > > > of
> > > > > > > > >>>>> strongly consistent and eventually
consistent.
> > > > > > > > >>>>> - Support N number of linked
HBase Clusters
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>> *Read-Replicas*
> > > > > > > > >>>>> Also note this is in alinement
with Read-Replicas and
> can
> > > > work
> > > > > > with
> > > > > > > > >> that.
> > > > > > > > >>>>> This client is multi-cluster
where Read-Replicas help
> us
> > to
> > > > be
> > > > > > > multi
> > > > > > > > >>>> Region
> > > > > > > > >>>>> Server.
> > > > > > > > >>>>>
> > > > > > > > >>>>> *Replication*
> > > > > > > > >>>>> You will also see in the document
that this works with
> > > > current
> > > > > > > > >>>> replication
> > > > > > > > >>>>> and requires no changes to it.
> > > > > > > > >>>>>
> > > > > > > > >>>>> *Only a Client change*
> > > > > > > > >>>>> You will also see in the doc
this is only a new client.
> > > Which
> > > > > > means
> > > > > > > > no
> > > > > > > > >>>>> extra code for the end developer,
only addition configs
> > to
> > > > set
> > > > > it
> > > > > > > up.
> > > > > > > > >>>>>
> > > > > > > > >>>>> *Github*
> > > > > > > > >>>>> This is a github project that
shows that this works at:
> > > > > > > > >>>>> https://github.com/tmalaska/HBase.MCC
> > > > > > > > >>>>> Note this is only a prototype.
When adding it to HBase
> we
> > > > will
> > > > > > use
> > > > > > > it
> > > > > > > > >> as
> > > > > > > > >>>> a
> > > > > > > > >>>>> starting point but there will
be changes.
> > > > > > > > >>>>>
> > > > > > > > >>>>> *Initial Results:*
> > > > > > > > >>>>>
> > > > > > > > >>>>> Red is where our primary cluster
has failed and you
> will
> > > see
> > > > > from
> > > > > > > the
> > > > > > > > >>>>> bottom to graphs that our puts,
deletes, and gets are
> not
> > > > > > > > interrupted.
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/tmalaska/HBase.MCC/blob/master/AveragePutTimeWithMultiRestartsAndShutDowns.png
> > > > > > > > >>>>>
> > > > > > > > >>>>> Thanks
> > > > > > > > >>>>> Ted Malaska
> > > > > > > > >>>>
> > > > > > > > >>>> The opinions expressed here are mine,
while they may
> > > reflect a
> > > > > > > > cognitive
> > > > > > > > >>>> thought, that is purely accidental.
> > > > > > > > >>>> Use at your own risk.
> > > > > > > > >>>> Michael Segel
> > > > > > > > >>>> michael_segel (AT) hotmail.com
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>
> > > > > > > > >> The opinions expressed here are mine, while
they may
> > reflect a
> > > > > > > cognitive
> > > > > > > > >> thought, that is purely accidental.
> > > > > > > > >> Use at your own risk.
> > > > > > > > >> Michael Segel
> > > > > > > > >> michael_segel (AT) hotmail.com
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > > > The opinions expressed here are mine, while they may
reflect
> a
> > > > > > cognitive
> > > > > > > > thought, that is purely accidental.
> > > > > > > > Use at your own risk.
> > > > > > > > Michael Segel
> > > > > > > > michael_segel (AT) hotmail.com
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sean
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message