helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ming Fang <mingf...@mac.com>
Subject Re: How to handle total Zookeeper restart
Date Mon, 04 Mar 2013 12:43:32 GMT
I manually brought down Zookeeper and erased the Zookeeper data on purpose as a test.
The goal is to find a way  to 
1)continue processing in the event of total Helix/Zookeeper failure and lost of state.
2)recovery gracefully once Helix/Zookeeper are restarted.

On Mar 4, 2013, at 12:44 AM, kishore g <g.kishore@gmail.com> wrote:

> Hi Ming,
> Helix depends on the data in zookeeper. Its ok for zookeeper to restart and Helix will
handle it but if zookeeper loses its state( data directory) then unfortunately we cannot recover
the state.
> How did you lose the zookeeper cluster ( including  state ). 
> thanks,
> Kishore G
> On Sun, Mar 3, 2013 at 8:58 PM, Ming Fang <mingfang@mac.com> wrote:
> Hi
> When I have a working Helix cluster, all participants for working fine, and for whatever
reason I lost the entire Zookeeper cluster(including all state),
> what is the best way to handle this?
> Ideally I want all the participants to continue working and that the only capability
I would loose is Helix's ability to failover.
> Upon restart of Zookeeper, the Controllers and Participants should register their latest
state back to the new Zookeeper cluster.
> However my tests thus far shows that even thought the HelixManager reconnects, they do
not write the necessary data into Zookeeper for the cluster to function correctly.
> For example, the external view callbacks are not showing the participants at all.
> Is this something Helix should handle or is it up to the applications to detect the failure
and then recreate new HelixManagers?
> Thanks
> --ming

View raw message