hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4635) Add global blacklist tracking for AM container failure.
Date Tue, 02 Feb 2016 17:34:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128628#comment-15128628
] 

Sunil G commented on YARN-4635:
-------------------------------

bq. it is possible that two lists together could unexpectedly blacklist all nodes. 
Hi [~djp]. Is this the case where node1 to node6 is blacklisted by app and node7 to node10
is blacklist by global manager (considering we have node1 to node10 and disableThreshold is
0.8).

Could we also check {{disableThreshold}} on the total Set which we created now. And if we
crosses the limit, clear app based / global based blacklists from this list. Could this solve
the above mentioned scenario?

> Add global blacklist tracking for AM container failure.
> -------------------------------------------------------
>
>                 Key: YARN-4635
>                 URL: https://issues.apache.org/jira/browse/YARN-4635
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: YARN-4635-v2.patch, YARN-4635.patch
>
>
> We need a global blacklist in addition to each app’s blacklist to track AM container
failures in global 
> affection. That means we need to differentiate the non­-succeed ContainerExitStatus
reasoning from 
> NM or more related to App. 
> For more details, please refer the document in YARN-4576.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message