hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4672) RM with lost NMs results in massive log of AppAttemptId doesnt exist in cache
Date Mon, 24 Sep 2012 22:12:08 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462185#comment-13462185
] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4672:
----------------------------------------------------

Chris, yes I understand that your NM process is down. But when NM goes down, it doesn't kill
its containers as of now. So I am sure your AM container process is still running (from the
call trace ApplicationMasterService.java:allocate(247) ).

You have to kill this AM process either manually or by handling the value of AMResponse.getReboot()
in code.

If this isn't your current job, it should be some stale AM from before.
                
> RM with lost NMs results in massive log of AppAttemptId doesnt exist in cache
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4672
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4672
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 0.23.1
>            Reporter: Chris Riccomini
>            Assignee: Vinod Kumar Vavilapalli
>
> Hey Guys,
> I'm running a 9 node cluster with 8 NMs and a single RM node. If I run an app master
and have that app master start a container, then shut down all NMs, but leave the RM up (to
simulate a failure), the containers timeout and fail, as expected.
> What's unexpected is that my log then starts filling with:
> 2012-09-21 18:02:02,614 ERROR resourcemanager.ApplicationMasterService (ApplicationMasterService.java:allocate(247))
- AppAttemptId doesnt exist in cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:03,617 ERROR resourcemanager.ApplicationMasterService (ApplicationMasterService.java:allocate(247))
- AppAttemptId doesnt exist in cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:04,618 ERROR resourcemanager.ApplicationMasterService (ApplicationMasterService.java:allocate(247))
- AppAttemptId doesnt exist in cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:05,620 ERROR resourcemanager.ApplicationMasterService (ApplicationMasterService.java:allocate(247))
- AppAttemptId doesnt exist in cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:06,621 ERROR resourcemanager.ApplicationMasterService (ApplicationMasterService.java:allocate(247))
- AppAttemptId doesnt exist in cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:07,623 ERROR resourcemanager.ApplicationMasterService (ApplicationMasterService.java:allocate(247))
- AppAttemptId doesnt exist in cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:08,624 ERROR resourcemanager.ApplicationMasterService (ApplicationMasterService.java:allocate(247))
- AppAttemptId doesnt exist in cache appattempt_1348248013002_0001_000001
> Is there any way to shut this off/fix it? It just keeps going forever, until I bounce
the RM node.
> Thanks!
> Chris

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message