hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <msegel_had...@hotmail.com>
Subject Re: How to make HBase survive a laptop standby
Date Thu, 05 Jun 2014 12:05:02 GMT
Could you clarify what you mean by 'network partition' ? 

First blush... not a good idea. 

Can you please explain how a laptop suspend is an issue for those running HBase in a production
environment? 

I really want to encourage the committers to think more like product owners because HBase
is now being presented to the enterprise by Cloudera and Amazon. 
(MapR has M7 and I haven't seen either IBM or Pivotal in the wild.) 





On Jun 4, 2014, at 10:30 AM, Cosmin Lehene <clehene@adobe.com> wrote:

> Thanks Andy,
> 
> This is in lines with what I was asking/thinking of.
> 
> To separate concerns a bit:
> The question is if HRegionSever instead of exiting could have a standby
> state (maintaining the current region state would be optional depending
> whether we have shadow regions or not) where it doesn¹t serve requests,
> but still visible to the cluster. This way we could ³wake it up² if we
> decide that is safe (e.g. we remain consistent).
> 
> This scenarios seems to solve both my laptop standby problem as well as
> (potentially fast) recovery after a network partition.
> 
> Cosmin
> 
> 
> 
> On 6/3/14, 3:42 PM, "Andrew Purtell" <apurtell@apache.org> wrote:
> 
>> Although a dev lapop suspend and resume is equivalent to a total network
>> failure for the elapsed time, with all that implies, perhaps after
>> HBASE-10070 goes in it could be possible for a RS to stay up with regions
>> switched to read only replica mode, reconnect, and perhaps the master upon
>> discovering all regions still available on the cluster but in read only
>> mode (no primary), it could reassign primaries - the net effect being
>> indeed no process needs restarting by a supervisor.
>> 
>> 
>> On Tue, Jun 3, 2014 at 3:27 PM, Andrew Purtell <apurtell@apache.org>
>> wrote:
>> 
>>> Stating the obvious you just need to restart the RegionServer because it
>>> shut down. We use ZooKeeper for tracking server liveness and from the
>>> ZooKeeper perspective a sufficiently long time elapsed without heartbeat
>>> such that the RegionServer's session expired. We've left this option to
>>> date to the user to do with supervisory scripts, e.g. Puppet / Chef /
>>> Daemontools. I suppose a RegionServer could try and reinitialize as a
>>> new
>>> process or the ./bin/hbase script could do this if you ask.
>>> 
>>> 
>>> On Tue, Jun 3, 2014 at 2:46 PM, Cosmin Lehene <clehene@adobe.com> wrote:
>>> 
>>>> I just realized that, for years, I've been countlessly restarted hbase
>>>> every time my laptop gets out of standby.
>>>> I know well why I do this, but I also know I could probably not do it
>>>> and that I don't have to do with Hadoop or Zookeeper or other
>>>> services and
>>>> I wish I wouldn't need to with Hbase either.
>>>> 
>>>> So short term, I'd like to know if there a better way already.
>>>> 
>>>> Long term I think this is a bigger, more fundamental resiliency aspect
>>>> that perhaps is not trivial, but probably worth thinking about in the
>>>> real
>>>> deployments context and I wonder if there's something that tries to
>>>> solve
>>>> this already.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Thanks,
>>>> Cosmin
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>> 
>>  - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
> 
> 



Mime
View raw message