hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAPREDUCE-5617) map task is not re-launched when the task is failed while reducers are running with full cluster capacity - which will lead to job hang
Date Sat, 09 May 2015 02:18:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sunil G resolved MAPREDUCE-5617.
--------------------------------
    Resolution: Invalid

Yes Vinod.

I am marking this as invalid. I checked a similar scenario, and i could see map task is getting
re-launched.

However, I will check more detail., And if its reoccurring, i ll reopen.

> map task is not re-launched when the task is failed while reducers are running with full
cluster capacity - which will lead to job hang
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5617
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5617
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: SuSe Linux
>            Reporter: Sunil G
>            Priority: Critical
>
> In a Cluster with 16GB capacity, job has started with 100maps and 10 reducers. 
> When the reducers has started its execution, one NM has went down and resulted a failure
for 2 maps. But at this time, remaining 8Gb was used by 6 reducers and AM. So there was no
place to launch the failed maps. [NM never came up again, and cluster size became 8GB]
> If we kill one of reducers, then also the map cannot be launched as the priority of Failed
map is lesser than that of reducer. So the remaining reducer only will get allocated from
RM side.
> This is causing a hang for in reducer side. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message