hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5694) ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable
Date Wed, 30 Nov 2016 21:35:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709850#comment-15709850
] 

Daniel Templeton edited comment on YARN-5694 at 11/30/16 9:35 PM:
------------------------------------------------------------------

The test failure looks legit (which is odd since it worked locally), but the rest of the issues
are bogus.  I'll take a closer look at the test failure.


was (Author: templedf):
The test failure looks legit (which is odd since it worked locally), but the rest of the issues
are bogus.

> ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is
unreachable
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5694
>                 URL: https://issues.apache.org/jira/browse/YARN-5694
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.3
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>              Labels: oct16-medium
>         Attachments: YARN-5694.001.patch, YARN-5694.002.patch, YARN-5694.003.patch, YARN-5694.004.patch,
YARN-5694.004.patch, YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, YARN-5694.008.patch,
YARN-5694.branch-2.6.001.patch, YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch,
YARN-5694.branch-2.7.004.patch, YARN-5694.branch-2.7.005.patch
>
>
> {{ZKRMStateStore.doStoreMultiWithRetries()}} holds the lock while trying to talk to ZK.
 If the connection fails, it will retry while still holding the lock.  The retries are intended
to be strictly time limited, but in the case that the ZK node is unreachable, the time limit
fails, resulting in the thread holding the lock for over an hour.  Transitioning the RM to
standby requires that same lock, so in exactly the case that the RM should be transitioning
to standby, the {{VerifyActiveStatusThread}} blocks it from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message