hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.
Date Sun, 22 Feb 2015 05:19:11 GMT
zhihai xu created YARN-3242:
-------------------------------

             Summary: Old ZK client session watcher event messed up new ZK client session
due to ZooKeeper asynchronously closing client session.
                 Key: YARN-3242
                 URL: https://issues.apache.org/jira/browse/YARN-3242
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.6.0
            Reporter: zhihai xu
            Assignee: zhihai xu
            Priority: Critical


Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously
closing client session.
The watcher event from old ZK client session can still be sent to ZKRMStateStore when the
old  ZK client session is closed.
This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper session.
We only have one ZKRMStateStore but we can have multiple ZK client sessions.
Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher event is from
current session. So the watcher event from old ZK client session which just is closed will
still be processed.
For example, If a Disconnected event received from old session after new session is connected,
the zkClient will be set to null
{code}
        case Disconnected:
          LOG.info("ZKRMStateStore Session disconnected");
          oldZkClient = zkClient;
          zkClient = null;
          break;
{code}
Then ZKRMStateStore won't receive SyncConnected event from new session because new session
is already in SyncConnected state and it won't send SyncConnected event until it is disconnected
and connected again.
Then we will see all the ZKRMStateStore operations fail with IOException "Wait for ZKClient
creation timed out" until  RM shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message