hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: [jira] Created: (HBASE-1312) ZooKeeper: Master's ephemeral node went away while it was still up and functioning normally
Date Mon, 06 Apr 2009 03:25:10 GMT

That's an unfortunate side effect of some aspect of the ZK
implementation, I suppose. 

HBase clients, regionservers, and masters with watches on
ephemeral nodes will have to treat their disappearance as
advisory only and check back once or twice before taking
any recovery actions. It lengthens the time for recovery
beyond what would be necessary without this wrinkle, which
is unfortunate. 

Just to be clear by restart you are talking about re-
initializing the ZK wrapper only, correct? It should not be
necessary to restart everything on a node to deal with an
expired ZK session, right? 

> From: Nitay
> The master did not respond correctly to a SessionExpired
> event. I don't think there's a ZK bug. This is like
> HBASE-1232. Both the master and regionserver got a
> SessionExpired event. The bug I fixed for Ryan was just
> with the client getting a SessionExpired. Andrew's
> cluster shows us that it's just as likely for the master/
> RS to get this event.
> The only thing you can do on a SessionExpired event is to
> completely restart the node. SessionExpired means your
> ZooKeeper handle is dead, and your ephemeral nodes will go
> away. Since every server in HBase has some ephemeral
> node that indicates it liveness (e.g. /hbase/master,
> /hbase/rs/...), the node has to completely restart.
> HBASE-1232, HBASE-1311, and HBASE-1312 are all the same
> problem, just with three different points of view (client,
> RS, master).
> On Sun, Apr 5, 2009 at 2:32 PM, Ryan Rawson wrote:
> > ZK keeps the note up as long as the session is still
> > valid.
> > So the question is:
> > - did the master not respond correctly to an expired
> > session?
> > - is there a ZK bug (HOPE NOT!)
> >
> > -ryan
> >
> > On Sun, Apr 5, 2009 at 2:22 PM, Andrew Purtell wrote:
> > > ZooKeeper: Master's ephemeral node went away
> > > while it was still up and functioning normally


View raw message