hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4679) When work-preserving restart is enabled, the scheduler should wait for the earlier of recovery completion and configured wait time
Date Tue, 09 Feb 2016 04:34:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138333#comment-15138333
] 

Karthik Kambatla commented on YARN-4679:
----------------------------------------

Thanks Jason. My bad - completely forgot the discussion around this. 

[~jianhe], [~vinodkv] - I vaguely remember us discussing the notion of a threshold for fraction
of nodes that were previously connected in addition to this timeout. Do I remember right?
Do you think it still makes sense and we can use it as a proxy for recovery completion? 

> When work-preserving restart is enabled, the scheduler should wait for the earlier of
recovery completion and configured wait time
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4679
>                 URL: https://issues.apache.org/jira/browse/YARN-4679
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Karthik Kambatla
>
> When work-preserving restart is enabled, it appears the restart (or failover) is unconditionally
blocked for the configured delay even if the recovery itself finishes sooner than this. This
should be updated to wait for the earlier of the two conditions. Also, it would be nice to
allow setting the config to -1 to indicate wait as long as need for the recovery to be completed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message