hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
Date Fri, 16 Oct 2015 10:17:06 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960478#comment-14960478
] 

Varun Saxena commented on MAPREDUCE-6513:
-----------------------------------------

The headroom is not very high(sometimes comes as 0 in response too) as other heavy apps are
running. We notice that we always ramp up and ramping down never happens which schedules reducers
too aggressively. As can be seen below, there is no ramp down(except first time - 651 ramp
downs).
And we always find ramp up happening.
{noformat}
2015-10-13 04:36:53,038 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:42,132 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:651
2015-10-13 04:53:43,135 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:44,137 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:45,140 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:46,143 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:47,146 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:48,149 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:49,152 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:50,155 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:51,158 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:52,161 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:53,164 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:54,167 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:55,170 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:56,181 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:57,184 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:58,187 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:53:59,190 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:00,193 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:01,205 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:02,208 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:03,211 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:04,213 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:05,216 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:06,219 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:07,221 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:08,225 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:09,228 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:10,231 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:11,235 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:12,239 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:13,242 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:14,245 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:15,248 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:16,276 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:17,280 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:18,283 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:19,286 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:20,289 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:21,292 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:22,295 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:23,298 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:24,301 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:25,304 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
2015-10-13 04:54:26,307 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping down all scheduled reduces:0
{noformat}

{noformat}
2015-10-13 04:37:39,685 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
All maps assigned. Ramping up all remaining reduces:651
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 10
2015-10-13 04:55:05,923 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 5
2015-10-13 04:55:06,929 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 5
2015-10-13 04:55:07,945 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 1
2015-10-13 04:55:12,031 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 10
2015-10-13 04:55:13,053 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 12
2015-10-13 04:55:14,061 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 10
2015-10-13 04:55:16,075 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:55:17,092 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 1
2015-10-13 04:55:20,147 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 1
2015-10-13 04:55:21,165 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 10
2015-10-13 04:55:22,175 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:55:23,184 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 5
2015-10-13 04:55:24,197 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 1
2015-10-13 04:55:29,299 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 8
2015-10-13 04:55:30,311 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 15
2015-10-13 04:55:31,320 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 10
2015-10-13 04:55:32,327 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 1
2015-10-13 04:55:43,496 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 1
2015-10-13 04:55:44,509 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 4
2015-10-13 04:55:45,521 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 5
2015-10-13 04:55:46,530 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 4
2015-10-13 04:55:47,543 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:55:57,680 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 1
2015-10-13 04:55:58,698 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 6
2015-10-13 04:55:59,715 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 5
2015-10-13 04:56:00,721 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 6
2015-10-13 04:56:05,795 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:56:07,820 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 1
2015-10-13 04:56:08,831 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:56:09,841 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:56:10,853 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:56:22,018 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 15
2015-10-13 04:56:23,036 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:56:24,043 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 6
2015-10-13 04:56:29,114 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:56:31,138 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 4
2015-10-13 04:56:32,148 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 3
2015-10-13 04:56:33,157 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 3
2015-10-13 04:56:45,328 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 6
2015-10-13 04:56:46,349 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 2
2015-10-13 04:56:47,356 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 3
2015-10-13 04:56:57,499 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 8
2015-10-13 04:56:58,514 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 7
2015-10-13 04:56:59,521 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Ramping up 10
{noformat}
 

> MR job got hanged forever when one NM unstable for some time
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-6513
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Bob
>            Assignee: Varun Saxena
>            Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable due to some
OS issue.After the node became unstable, the map on this node status changed to KILLED state.

> Currently maps which were running on unstable node are rescheduled, and all are in scheduled
state and wait for RM assign container.Seen ask requests for map till Node is good (all those
failed), there are no ask request after this. But AM keeps on preempting the reducers (it's
recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get container..
> My Question Is:
> ============
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message