hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Malaska <ted.mala...@cloudera.com>
Subject Re: [DISCUSS] Multi-Cluster HBase Client
Date Tue, 30 Jun 2015 15:20:18 GMT
Let me try to break your email down into points that everyone on the thread
can understand then I will give my counters.

1.     We shouldn’t do this because, Eventual consistency between the two
cluster is a bad thing.

2.     We shouldn’t do this because it is hard to figure out when to
failover.

3.     We shouldn’t do this because we would have redundant client (meaning
the gets, puts, or scans) would run on both clusters when not needed and
the results may be different.



*Point 1:*

            This is the point of the whole design.  If your primary cluster
is down and you have the option to configure HBase.MCC if your use cases
needs any answer rather then getting no answer at all.  This is the
argument for Cassandra for some use cases like this.   So Eventual
Consisutency is not a valid reason to shut this idea down.



*Point 2:*

            If you read the document and the code (which you say you don’t
have enough time to do, but clearly you have time to insult people on
e-mail threads).  You will find that there are a number of parameters that
allow the user to define what are the thresholds to represent a cluster
failover.  There are many configurable options.  So allowing the user to
define when cluster failure happens removes your second point of concern.



*Point 3:*

            We will be doing dual actions on both clusters and those
answers many not be the same.  This counter point is really a combination
of the first two:

-       *Point 1 counter:* yes we know if we are requesting from a fail
over cluster we know it might not be 100% up to date.  Answer from failover
cluster a logged and we can even tell the user which cluster the result
came from if needed.  So this isn’t an issue.

-       *Point 2 counter:* again there are configs for both accepting
request from other clusters and also there are configs for deciding when to
send request to the other cluster.  So we could config the client in
multiple ways.  Option 1 always request both and grab the first to return,
Option 2 request from primary, wait a given period of time then request
from failover cluster, Option 3 request from both and wait N milliseconds
for the primary if it doesn’t come take the failover.



Let me know what you think of these.  If you have more question I’m here
all day and I’m happy to work through the design with you.



The fact is I’m not on this thread because I want to have a fire war over
e-mail.  I’m on this thread because this is a real problem and a real
demand from our users.  I would like to help fix that need.   Is this
design perfect?  Maybe not but it is a good first step and with your help
and the help of others on HBase I’m sure we can add additional
configurations or functionality to plug any wholes to make it a very
powerful offering in the market.



Ted Malaska

On Tue, Jun 30, 2015 at 10:55 AM, Michael Segel <michael_segel@hotmail.com>
wrote:

> Sean,
>
> You’re a developer, or just some higher level primate that pounds code?
>
> I don’t want to embarrass you, but what do they teach in engineering
> schools these days?
>
> An ‘elevator pitch’ is more than just some sound bites. The 30 second
> pitch is a high level brief or an executive summary that is supposed to
> catch the attention of your audience.  Good luck getting the attention of
> your business sponsor to approve your project if you can’t explain it
> quickly and concisely. The reason I asked for it is that what Ted write had
> two different connotations.
>
> But I digress.
>
> As you can see from the ensuing messages in this thread, the clarification
> that I requested was needed.  (The Big Table presentation wasn’t talking
> about what Ted wanted to do.)
>
> Which brings us to the point of my questioning to Ted.
>
> While its great that Ted is thinking, he needs to think about the problem
> and why what he is asking for is not such a great idea.
>
> Ted,
> I would have to say that trying to create a fault tolerant client for
> connecting to paired clusters won’t work well in practice and here’s why:
>
> Draw out a pretty picture. You have Cluster A represented by a cloud on
> the left hand side.  You have cluster B represented on the right hand side.
> You have an arrow going from Cluster A to Cluster B. This represents the
> flow of the data replication.   Now draw below a stick figure , or a circle
> to represent the client.  Now draw two bidirectional arrows between the
> client and the two clusters.  You have a triangle.   So you can have either
> an Active/Active or Active / Passive scenario.
>
> If you have Active/Passive, then you’re not really going to want to serve
> data from Cluster B until Cluster A fails.  If you have an Active / Active,
> then you have both cluster serving data.  Which gets us back to another
> point…
> You have to have another box or circle representing the data ingestion
> source.  You then draw the line to cluster A.  (You said replication to
> Cluster B)  So now you have the following scenario….
>
> Data flows in to cluster A, its then replicated to cluster B. That
> replication doesn’t happen instantaneously. So there is some time delta
> where data exists in one cluster, but not the other cluster.  So that if
> you did a set of simultaneous scans or ran a server side function (using
> coprocessors) you would not be guaranteed the same result.  (Ooops!)
>
> Eventual consistency between the two clusters will kill you.
>
> The other issue… what constitutes a failure so that you don’t want to
> connect to Cluster A, but to Cluster B?
>
> The safest thing is to fail, and recognized that you failed and to then
> connect to cluster B.
>
> The problem you face is that you have a lot of variables working against
> you.  Are the clusters homogenous?  Are the pipes to either data center the
> same?  Is the pipe between each data center large enough?  (e.g. you may
> have a 40Gb/S pipe between data centers, but if you’ve got a lot of systems
> pushing data over the network… you can still see delays between the two
> clusters. (Assuming that the data centers are only 30km - 50km apart (or
> less) so that you have a minimal amount of network latency. )
>
> At the same time… suppose you have your data source write to both clusters
> simultaneously and you’re serving data from both clusters simultaneously.
> You may still have small windows where the clusters’ results will not be
> consistent. (But lets assume that they are consistent. ) We now have your
> redundant client. You would therefore run the same scan across both data
> centers. Assuming that you get the same results, you take the first data
> set back and use it. But you’re constantly using resources on the second
> cluster and wasting them.
>
> You’re suggesting a generic client and that means that you’re wasting
> cluster resources all the time.
>
> If you had a specific application… then your solution is an one off
> solution. So here, you might as well just write a multi-threaded client
> where each connection is thread safe and just overload the call to when you
> scan to pass the scan to both connections.
>
> HTH
>
>
>
>
> > On Jun 29, 2015, at 3:09 PM, Sean Busbey <busbey@cloudera.com> wrote:
> >
> > Michael,
> >
> > This is the dev list, no sound-bite pitch is needed. We have plenty of
> > features that take time to explain the nuance. Please either engage with
> > the complexity of the topic or wait for the feature to land and get
> > user-accessible documentation. We all get busy from time to time, but
> > that's no reason to push a higher burden on those who are currently
> engaged
> > with a particular effort, especially this early in development.
> >
> > That said, the first paragraph gives a suitable brief motivation
> (slightly
> > rephrased below):
> >
> >> Some applications require response and availability SLAs that a single
> > HBase cluster can not meet alone. Particularly for
> >> high percentiles, queries to a single cluster can be delayed by e.g. GC
> > pauses, individual server process failure, or maintenance
> >> activity. By providing clients with a transparent multi-cluster
> > configuration option we can avoid these outlier conditions by
> >> mask these failures from applications that are tolerant to weaker
> > consistency guarantees than HBase provides out of the box.
> >
> >
> > Ted,
> >
> > Thanks for writing this up! We'd prefer to keep discussion of it on the
> > mailing list, so please avoid moving to private webex's.
> >
> > Would you mind if I or one of the other community members converted the
> > design doc to pdf so that it's more accessible?
> >
> >
> >
> > On Mon, Jun 29, 2015 at 4:52 PM, Ted Malaska <ted.malaska@cloudera.com>
> > wrote:
> >
> >> Why don't we set up a webex to talk out the detail.  What times r u
> open to
> >> talk this week.
> >>
> >> But to answer your questions.  This is for active active and active
> >> failover clusters.  There is a primary and n number of fail overs per
> >> client.  This is for gets and puts.
> >>
> >> There r a number of configs in the doc to define how to failover.  The
> >> options allow a couple different use cases.  There is a lot of detail in
> >> the doc and I just didn't want to put it all in the email.
> >>
> >> But honestly I put a lot of time in the doc.   I would love to know
> what u
> >> think.
> >> On Jun 29, 2015 5:46 PM, "Michael Segel" <michael_segel@hotmail.com>
> >> wrote:
> >>
> >>> Ted,
> >>>
> >>> If you can’t do a 30 second pitch, then its not worth the effort. ;-)
> >>>
> >>> Look, when someone says that they want to have a single client talk to
> >>> multiple HBase clusters, that could mean two very different things.
> >>> First, you could mean that you want a single client to connect to an
> >>> active/active pair of HBase clusters where they replicate to each
> other.
> >>> (Active / Passive would also be implied, but then you have the issue of
> >>> when does the passive cluster go active? )
> >>>
> >>> Then you have the issue of someone wanting to talk to multiple
> different
> >>> clusters so that they can query the data, create local data sets which
> >> they
> >>> wish to join, combining data from various sources.
> >>>
> >>> The second is a different problem from the first.
> >>>
> >>> -Mike
> >>>
> >>>> On Jun 29, 2015, at 3:38 PM, Ted Malaska <ted.malaska@cloudera.com>
> >>> wrote:
> >>>>
> >>>> Hey Michael,
> >>>>
> >>>> Read the doc please.  It goes through everything at a low level.
> >>>>
> >>>> Thanks
> >>>> Ted Malaska
> >>>>
> >>>> On Mon, Jun 29, 2015 at 4:36 PM, Michael Segel <
> >>> michael_segel@hotmail.com>
> >>>> wrote:
> >>>>
> >>>>> No down time?
> >>>>>
> >>>>> So you want a client to go against a pair of active/active hbase
> >>> instances
> >>>>> on tied clusters?
> >>>>>
> >>>>>
> >>>>>> On Jun 29, 2015, at 3:20 PM, Ted Malaska <ted.malaska@cloudera.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hey Michael,
> >>>>>>
> >>>>>> The use case is simple "No down time use cases" even in the
case of
> >>> site
> >>>>>> failure.
> >>>>>>
> >>>>>> Now on this statement
> >>>>>> "Why not simply manage each connection/context via a threaded
> child?"
> >>>>>>
> >>>>>> That is the point, to make that simple, tested, easy, and
> transparent
> >>> for
> >>>>>> HBase users.
> >>>>>>
> >>>>>> Ted Malaska
> >>>>>>
> >>>>>> On Mon, Jun 29, 2015 at 4:11 PM, Michael Segel <
> >>>>> michael_segel@hotmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> So if I understand your goal, you want a client who can
connect to
> >> one
> >>>>> or
> >>>>>>> more hbase clusters at the same time…
> >>>>>>>
> >>>>>>> Ok, so lets walk through the use case and help me understand
a
> >> couple
> >>> of
> >>>>>>> use cases for this…
> >>>>>>>
> >>>>>>> Why not simply manage each connection/context via a threaded
child?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jun 29, 2015, at 1:48 PM, Ted Malaska <
> ted.malaska@cloudera.com
> >>>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hey Dev List,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> My name is Ted Malaska, long time lover and user of
HBase. I would
> >>> like
> >>>>>>> to
> >>>>>>>> discuss adding in a multi-cluster client into HBase.
Here is the
> >> link
> >>>>> for
> >>>>>>>> the design doc (
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx%20(1).docx
> >>>>>>> )
> >>>>>>>> but I have pulled some parts into this main e-mail to
give you a
> >> high
> >>>>>>> level
> >>>>>>>> understanding of it's scope.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *Goals*
> >>>>>>>>
> >>>>>>>> The proposed solution is a multi-cluster HBase client
that relies
> >> on
> >>>>> the
> >>>>>>>> existing HBase Replication functionality to provide
an eventual
> >>>>>>> consistent
> >>>>>>>> solution in cases of primary cluster down time.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >> https://github.com/tmalaska/HBase.MCC/blob/master/FailoverImage.png
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Additional goals are:
> >>>>>>>>
> >>>>>>>> -
> >>>>>>>>
> >>>>>>>> Be able to switch between single HBase clusters to Multi-HBase
> >> Client
> >>>>>>>> with limited or no code changes.  This means using the
> >>>>>>> HConnectionManager,
> >>>>>>>> Connection, and Table interfaces to hide complexities
from the
> >>>>>>> developer
> >>>>>>>> (Connection and Table are the new classes for HConnection,
and
> >>>>>>>> HTableInterface in HBase version 0.99).
> >>>>>>>> -
> >>>>>>>>
> >>>>>>>> Offer thresholds to allow developers to decide between
degrees of
> >>>>>>>> strongly consistent and eventually consistent.
> >>>>>>>> - Support N number of linked HBase Clusters
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *Read-Replicas*
> >>>>>>>> Also note this is in alinement with Read-Replicas and
can work
> with
> >>>>> that.
> >>>>>>>> This client is multi-cluster where Read-Replicas help
us to be
> >> multi
> >>>>>>> Region
> >>>>>>>> Server.
> >>>>>>>>
> >>>>>>>> *Replication*
> >>>>>>>> You will also see in the document that this works with
current
> >>>>>>> replication
> >>>>>>>> and requires no changes to it.
> >>>>>>>>
> >>>>>>>> *Only a Client change*
> >>>>>>>> You will also see in the doc this is only a new client.
Which
> means
> >>> no
> >>>>>>>> extra code for the end developer, only addition configs
to set it
> >> up.
> >>>>>>>>
> >>>>>>>> *Github*
> >>>>>>>> This is a github project that shows that this works
at:
> >>>>>>>> https://github.com/tmalaska/HBase.MCC
> >>>>>>>> Note this is only a prototype. When adding it to HBase
we will use
> >> it
> >>>>> as
> >>>>>>> a
> >>>>>>>> starting point but there will be changes.
> >>>>>>>>
> >>>>>>>> *Initial Results:*
> >>>>>>>>
> >>>>>>>> Red is where our primary cluster has failed and you
will see from
> >> the
> >>>>>>>> bottom to graphs that our puts, deletes, and gets are
not
> >>> interrupted.
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/tmalaska/HBase.MCC/blob/master/AveragePutTimeWithMultiRestartsAndShutDowns.png
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Ted Malaska
> >>>>>>>
> >>>>>>> The opinions expressed here are mine, while they may reflect
a
> >>> cognitive
> >>>>>>> thought, that is purely accidental.
> >>>>>>> Use at your own risk.
> >>>>>>> Michael Segel
> >>>>>>> michael_segel (AT) hotmail.com
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>> The opinions expressed here are mine, while they may reflect a
> >> cognitive
> >>>>> thought, that is purely accidental.
> >>>>> Use at your own risk.
> >>>>> Michael Segel
> >>>>> michael_segel (AT) hotmail.com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>> The opinions expressed here are mine, while they may reflect a
> cognitive
> >>> thought, that is purely accidental.
> >>> Use at your own risk.
> >>> Michael Segel
> >>> michael_segel (AT) hotmail.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Sean
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message