hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5279) Jobs can deadlock if headroom is limited by cpu instead of memory
Date Thu, 04 Sep 2014 15:48:52 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated MAPREDUCE-5279:
            Priority: Critical  (was: Major)
    Target Version/s: 2.6.0
             Summary: Jobs can deadlock if headroom is limited by cpu instead of memory  (was:
mapreduce scheduling deadlock)

This seems like an important fix, as even large clusters often have small queues which will
emulate the same scenario as a small cluster setup.

[~peng.zhang] could you please rebase the patch on trunk?  Some other comments on the patch:

The previous code expected and handled getAvailableResources() returning null, but I don't
see a similar handling of null in the patch.  If headroom and newHeadroom end up both null
then I'm worried about this part of the patch:
+    if (newContainers.size() + finishedContainers.size() > 0 || headRoom != newHeadRoom
|| !headRoom.equals(newHeadRoom)) {

Also I think it would be better to not have an implicit vcore method to create a Resource
(appears to be unused in the patch anyway), so I'd advocate for removing this new method:
+  public static Resource createResource(int memory) {
+    return createResource(memory, (memory > 0) ? 1 : 0);
+  }

> Jobs can deadlock if headroom is limited by cpu instead of memory
> -----------------------------------------------------------------
>                 Key: MAPREDUCE-5279
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, scheduler
>    Affects Versions: 2.0.3-alpha
>            Reporter: Peng Zhang
>            Assignee: Peng Zhang
>            Priority: Critical
>             Fix For: trunk
>         Attachments: MAPREDUCE-5279-v2.patch, MAPREDUCE-5279.patch
> YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into
account virtual cores while scheduling reduce tasks.
> This may cause more reduce tasks to be scheduled because memory is enough. And on a small
cluster, this will end with deadlock, all running containers are reduce tasks but map phase
is not finished. 

This message was sent by Atlassian JIRA

View raw message