hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-2971) On cluster startup, master/rs connect to ZK before it's fully ready causing a ConnectionLossException
Date Wed, 08 Sep 2010 21:02:34 GMT
On cluster startup, master/rs connect to ZK before it's fully ready causing a ConnectionLossException
-----------------------------------------------------------------------------------------------------

                 Key: HBASE-2971
                 URL: https://issues.apache.org/jira/browse/HBASE-2971
             Project: HBase
          Issue Type: Bug
          Components: zookeeper
    Affects Versions: 0.90.0
            Reporter: Jonathan Gray
            Assignee: Jonathan Gray
             Fix For: 0.90.0


There is a race condition that has existed but has been glossed over to this point (because
of our "loose" zk usage).

The ZK server process can be in a state where it will accept the socket connection from our
client in master or RS but if we do anything against the server, we get a ConnectionLossException.
 The ZK client handles this automagically and reconnects properly, as long as we are not aborting
when we get this exception.

So this works on the last 0.89 and even with the master rewrite, but as we move towards strict
usage of ZK, we should wait for ZK availability before proceeding with startup.

I already have a patch in a local branch and it's working.  Will put up a patch soon against
new master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message