hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
Date Thu, 26 Feb 2015 19:43:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339012#comment-14339012

Vinod Kumar Vavilapalli commented on YARN-3025:

Coming in very late, apologies.

Some comments:
 - Echoing Bikas's first comment: Today the AMs are expected to maintain their own scheduling
state. With this you are changing that - part of the scheduling state will be remembered but
the remaining isn't. We should clearly draw a line somewhere, what is it?
 - [~zjshen] did a very good job of dividing the persistence concerns, but what is the guarantee
that is given to the app writers? "I'll return the list of blacklisted nodes whenever I can,
but shoot I died, so I can't help you much" is not going to cut it. If we want reliable notifications,
we should build a protocol between AM and RM about the persistence of the blacklisted node
list - too much of a complexity if you ask me. Why not leave it to the apps?
 - The blacklist information is per application-attempt, and scheduler will forget previous
application-attempts today. So as I understand it, the patch doesn't work.

> Provide API for retrieving blacklisted nodes
> --------------------------------------------
>                 Key: YARN-3025
>                 URL: https://issues.apache.org/jira/browse/YARN-3025
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List<String> blacklistAdditions,
>       List<String> blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes so that
the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List<String> getBlacklistedNodes()
> {code}

This message was sent by Atlassian JIRA

View raw message