hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3460) MR AM can hang if containers are allocated on a node blacklisted by the AM
Date Wed, 30 Nov 2011 16:29:40 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Joseph Evans updated MAPREDUCE-3460:
-------------------------------------------

    Attachment: MR-3460.txt

Sid,  You were correct.  It was not accessing the expected code.  I was confused because the
FAST_FAIL_MAP container was still being assigned.  It was just not sent to the scheduler before
the node was blacklisted.

I have updated the test, and also the code itself.  The original patch was updating the list
of failed maps and also the list of pending maps, but this caused the actual allocation of
the container to fail later on.
                
> MR AM can hang if containers are allocated on a node blacklisted by the AM
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3460
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3460
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Siddharth Seth
>            Assignee: Robert Joseph Evans
>            Priority: Blocker
>         Attachments: MR-3460.txt, MR-3460.txt
>
>
> When an AM is assigned a FAILED_MAP (priority = 5) container on a nodemanager which it
has blacklisted - it tries to
> find a corresponding container request.
> This uses the hostname to find the matching container request - and can end up returning
any of the ContainerRequests which may have requested a container on this node. This container
request is cleaned to remove the bad node - and then added back to the RM 'ask' list.
> The AM cleans the 'ask' list after each heartbeat - The RM Allocator is still aware of
the priority=5 container (in 'remoteRequestsTable') - but this never gets added back to the
'ask' set - which is what is sent to the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message