hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: [DISCUSS] Multi-Cluster HBase Client
Date Mon, 29 Jun 2015 21:46:50 GMT
Ted, 

If you can’t do a 30 second pitch, then its not worth the effort. ;-) 

Look, when someone says that they want to have a single client talk to multiple HBase clusters,
that could mean two very different things. 
First, you could mean that you want a single client to connect to an active/active pair of
HBase clusters where they replicate to each other. 
(Active / Passive would also be implied, but then you have the issue of when does the passive
cluster go active? ) 

Then you have the issue of someone wanting to talk to multiple different clusters so that
they can query the data, create local data sets which they wish to join, combining data from
various sources. 

The second is a different problem from the first. 

-Mike

> On Jun 29, 2015, at 3:38 PM, Ted Malaska <ted.malaska@cloudera.com> wrote:
> 
> Hey Michael,
> 
> Read the doc please.  It goes through everything at a low level.
> 
> Thanks
> Ted Malaska
> 
> On Mon, Jun 29, 2015 at 4:36 PM, Michael Segel <michael_segel@hotmail.com>
> wrote:
> 
>> No down time?
>> 
>> So you want a client to go against a pair of active/active hbase instances
>> on tied clusters?
>> 
>> 
>>> On Jun 29, 2015, at 3:20 PM, Ted Malaska <ted.malaska@cloudera.com>
>> wrote:
>>> 
>>> Hey Michael,
>>> 
>>> The use case is simple "No down time use cases" even in the case of site
>>> failure.
>>> 
>>> Now on this statement
>>> "Why not simply manage each connection/context via a threaded child?"
>>> 
>>> That is the point, to make that simple, tested, easy, and transparent for
>>> HBase users.
>>> 
>>> Ted Malaska
>>> 
>>> On Mon, Jun 29, 2015 at 4:11 PM, Michael Segel <
>> michael_segel@hotmail.com>
>>> wrote:
>>> 
>>>> So if I understand your goal, you want a client who can connect to one
>> or
>>>> more hbase clusters at the same time…
>>>> 
>>>> Ok, so lets walk through the use case and help me understand a couple of
>>>> use cases for this…
>>>> 
>>>> Why not simply manage each connection/context via a threaded child?
>>>> 
>>>> 
>>>> 
>>>>> On Jun 29, 2015, at 1:48 PM, Ted Malaska <ted.malaska@cloudera.com>
>>>> wrote:
>>>>> 
>>>>> Hey Dev List,
>>>>> 
>>>>> 
>>>>> My name is Ted Malaska, long time lover and user of HBase. I would like
>>>> to
>>>>> discuss adding in a multi-cluster client into HBase. Here is the link
>> for
>>>>> the design doc (
>>>>> 
>>>> 
>> https://github.com/tmalaska/HBase.MCC/blob/master/MultiHBaseClientDesignDoc.docx%20(1).docx
>>>> )
>>>>> but I have pulled some parts into this main e-mail to give you a high
>>>> level
>>>>> understanding of it's scope.
>>>>> 
>>>>> 
>>>>> *Goals*
>>>>> 
>>>>> The proposed solution is a multi-cluster HBase client that relies on
>> the
>>>>> existing HBase Replication functionality to provide an eventual
>>>> consistent
>>>>> solution in cases of primary cluster down time.
>>>>> 
>>>>> 
>>>>> https://github.com/tmalaska/HBase.MCC/blob/master/FailoverImage.png
>>>>> 
>>>>> 
>>>>> Additional goals are:
>>>>> 
>>>>> -
>>>>> 
>>>>> Be able to switch between single HBase clusters to Multi-HBase Client
>>>>> with limited or no code changes.  This means using the
>>>> HConnectionManager,
>>>>> Connection, and Table interfaces to hide complexities from the
>>>> developer
>>>>> (Connection and Table are the new classes for HConnection, and
>>>>> HTableInterface in HBase version 0.99).
>>>>> -
>>>>> 
>>>>> Offer thresholds to allow developers to decide between degrees of
>>>>> strongly consistent and eventually consistent.
>>>>> - Support N number of linked HBase Clusters
>>>>> 
>>>>> 
>>>>> *Read-Replicas*
>>>>> Also note this is in alinement with Read-Replicas and can work with
>> that.
>>>>> This client is multi-cluster where Read-Replicas help us to be multi
>>>> Region
>>>>> Server.
>>>>> 
>>>>> *Replication*
>>>>> You will also see in the document that this works with current
>>>> replication
>>>>> and requires no changes to it.
>>>>> 
>>>>> *Only a Client change*
>>>>> You will also see in the doc this is only a new client. Which means no
>>>>> extra code for the end developer, only addition configs to set it up.
>>>>> 
>>>>> *Github*
>>>>> This is a github project that shows that this works at:
>>>>> https://github.com/tmalaska/HBase.MCC
>>>>> Note this is only a prototype. When adding it to HBase we will use it
>> as
>>>> a
>>>>> starting point but there will be changes.
>>>>> 
>>>>> *Initial Results:*
>>>>> 
>>>>> Red is where our primary cluster has failed and you will see from the
>>>>> bottom to graphs that our puts, deletes, and gets are not interrupted.
>>>>> 
>>>> 
>> https://github.com/tmalaska/HBase.MCC/blob/master/AveragePutTimeWithMultiRestartsAndShutDowns.png
>>>>> 
>>>>> Thanks
>>>>> Ted Malaska
>>>> 
>>>> The opinions expressed here are mine, while they may reflect a cognitive
>>>> thought, that is purely accidental.
>>>> Use at your own risk.
>>>> Michael Segel
>>>> michael_segel (AT) hotmail.com
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Mime
View raw message