hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maysam Yabandeh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6043) Reducer-preemption does not kick in
Date Thu, 21 Aug 2014 17:43:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105665#comment-14105665

Maysam Yabandeh commented on MAPREDUCE-6043:

After a preemption, the task id is added to
And it is later removed once the preemption is successfully finished. In the second scenario
we observed, although the preemption was successfully finished its report was not received
through RM and hence the variable was not decreased and future preemptions did not kick in.

> Reducer-preemption does not kick in
> -----------------------------------
>                 Key: MAPREDUCE-6043
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Maysam Yabandeh
> We have seen various cases that reducer-preemption does not kick in and the scheduled
mappers wait behind running reducers forever. Each time there seems to be a different scenario.
So far we have tracked down two of such cases and the common element between them is that
the variables in RMContainerAllocator go out of sync since they only get updated when completed
container is reported by RM. However there are many corner cases that such report is not received
from RM and yet the MapReduce app moves forward. Perhaps one possible fix would be to update
such variables also after exceptional cases.
> The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded
> The preemption is triggered if the following is true:
> {code}
> headroom +  am * |m| + pr * |r| < mapResourceRequest
> {code} 
> where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being
preempted, and |r| is the reducer size. Each of these variables going out of sync will cause
the preemption not to kick in. In the following comment, we explain two of such cases.

This message was sent by Atlassian JIRA

View raw message