hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers
Date Fri, 30 Oct 2015 23:42:27 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated MAPREDUCE-6514:
-----------------------------------------------
    Status: Open  (was: Patch Available)

h4. Comment on current patch
You should look at {{rampDownReduces()}} API and use it instead of hand-rolling {{decContainerReq}}.
I actually think once we do this, you should remove {{clearAllPendingReduceRequests()}} altogether.

I am looking at branch-2 and I think the current patch is better served on top of MAPREDUCE-6302
(and this only in 2.8+) given the numerous changes made there. The patch obviously doesn't
apply on branch-2.7 which you set the target-version as (2.7.2). Canceling the patch.

h4. Meta thought
If MAPREDUCE-6513 goes through per my latest proposal there, there is no need for canceling
all the reduce asks and thus this patch, no? 

h4. Release
IAC, this has been a long-standing problem (though I'm very surprised nobody caught this till
now), so I'd propose we move this out into 2.7.3 or better 2.8+ so I can make progress on
the 2.7.2 release. Thoughts?

> Job hangs as ask is not updated after ramping down of all reducers
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6514
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.7.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>            Priority: Critical
>         Attachments: MAPREDUCE-6514.01.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled reduces
map and put these reducers to pending. This is not updated in ask. So RM keeps on assigning
and AM is not able to assign as no reducer is scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers immediately which
anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
>             + scheduledRequests.reduces.size());
>         for (ContainerRequest req : scheduledRequests.reduces.values()) {
>           pendingReduces.add(req);
>         }
>         scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Container not assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Cannot assign container Container: [ContainerId: container_1437451211867_1485_01_000216, NodeId:
hdszzdcxdat6g06u04p:26009, NodeHttpAddress: hdszzdcxdat6g06u04p:26010, Resource: <memory:4096,
vCores:1>, Priority: 10, Token: Token { kind: ContainerToken, service: 10.2.33.236:26009
}, ] for a reduce as either  container memory less than required 4096 or no pending reduce
tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Container not assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Cannot assign container Container: [ContainerId: container_1437451211867_1485_01_000217, NodeId:
hdszzdcxdat6g06u06p:26009, NodeHttpAddress: hdszzdcxdat6g06u06p:26010, Resource: <memory:4096,
vCores:1>, Priority: 10, Token: Token { kind: ContainerToken, service: 10.2.33.239:26009
}, ] for a reduce as either  container memory less than required 4096 or no pending reduce
tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message