hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5489) MR jobs hangs as it does not use the node-blacklisting feature in RM requests
Date Thu, 03 Oct 2013 00:19:42 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhijie Shen updated MAPREDUCE-5489:
-----------------------------------

    Attachment: MAPREDUCE-5489.1.patch

I've created the patch to make AM send blacklist nodes to RM. Basically the logical is described
as follows:

1. Add blacklistAdditions and blacklistRemovals to remember the blacklisted nodes added or
removed between two allocate calls. The two collections will be sent to RM in upcoming allocate
call.

2. Whenever a container fails on a host, the host will be blacklisted, and will add to blacklistAdditions
if blacklist is not ignored.

3. When changing from not ignoring blacklist to ignoring, we added all the blacklist nodes
 to blacklistRemovals.

4. When changing from ignoring blacklist to not ignoring, we added all the blacklist nodes
 to blacklistAdditions.

5.  Switching between ignoring and not ignoring blacklist nodes will not effect until the
upcoming allocate call, but anyway, it will effect eventually.

Test cases have been modified test whether RM is aware of the blacklisted nodes.

> MR jobs hangs as it does not use the node-blacklisting feature in RM requests
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Zhijie Shen
>         Attachments: MAPREDUCE-5489.1.patch
>
>
> When RM restarted, if during restart one NM went bad (bad disk), NM got blacklisted by
AM and RM keeps giving the containers on the same node even though AM doesn't want it there.
> Need to change AM to specifically blacklist node in the RM requests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message