hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Omkar Vinit Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
Date Wed, 10 Jul 2013 19:11:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704935#comment-13704935
] 

Omkar Vinit Joshi commented on YARN-541:
----------------------------------------

[~write2kishore] I just took a look at nm logs and I can see that "container_1366096597608_0001_01_000006"
container was allocated by RM and AM made a start container request for it on NM. I think
there is some problem in the AM logs. Can you take a look at your AM code again? Looks like
something is getting missed there.. If it is still occurring then can you print the logs when
AM makes a start container request to NM?? probably something is getting missed there..

{code}
2013-04-16 03:29:57,681 INFO  [IPC Server handler 4 on 34660] containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainer(402)) - Start request for container_1366096597608_0001_01_000006
by user dsadm
2013-04-16 03:29:57,684 INFO  [IPC Server handler 4 on 34660] nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89))
- USER=dsadm	IP=127.0.1.1	OPERATION=Start Container Request	TARGET=ContainerManageImpl	RESULT=SUCCESS
APPID=application_1366096597608_0001	CONTAINERID=container_1366096597608_0001_01_000006
2013-04-16 03:29:57,687 INFO  [AsyncDispatcher event handler] application.Application (ApplicationImpl.java:transition(255))
- Adding container_1366096597608_0001_01_000006 to application application_1366096597608_0001
2013-04-16 03:29:57,689 INFO  [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835))
- Container container_1366096597608_0001_01_000006 transitioned from NEW to LOCALIZED
2013-04-16 03:29:57,952 INFO  [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835))
- Container container_1366096597608_0001_01_000006 transitioned from LOCALIZED to RUNNING
2013-04-16 03:29:58,475 INFO  [Node Status Updater] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatus(249))
- Sending out status for container: container_id {, app_attempt_id {, application_id {, id:
1, cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics:
"", exit_status: -1000, 
2013-04-16 03:29:58,478 INFO  [Node Status Updater] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatus(249))
- Sending out status for container: container_id {, app_attempt_id {, application_id {, id:
1, cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 5, }, state: C_RUNNING, diagnostics:
"", exit_status: -1000, 
2013-04-16 03:29:58,481 INFO  [Node Status Updater] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatus(249))
- Sending out status for container: container_id {, app_attempt_id {, application_id {, id:
1, cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 6, }, state: C_RUNNING, diagnostics:
"", exit_status: -1000, 
2013-04-16 03:29:58,489 INFO  [ContainersLauncher #2] nodemanager.DefaultContainerExecutor
(DefaultContainerExecutor.java:launchContainer(113)) - launchContainer: [bash, /tmp/nm-local-dir/usercache/dsadm/appcache/application_1366096597608_0001/container_1366096597608_0001_01_000006/default_container_executor.sh]
2013-04-16 03:29:58,638 INFO  [ContainersLauncher #1] launcher.ContainerLaunch (ContainerLaunch.java:call(282))
- Container container_1366096597608_0001_01_000005 succeeded 
2013-04-16 03:29:58,639 INFO  [ContainersLauncher #2] launcher.ContainerLaunch (ContainerLaunch.java:call(282))
- Container container_1366096597608_0001_01_000006 succeeded 
2013-04-16 03:29:58,643 INFO  [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835))
- Container container_1366096597608_0001_01_000005 transitioned from RUNNING to EXITED_WITH_SUCCESS
2013-04-16 03:29:58,644 INFO  [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835))
- Container container_1366096597608_0001_01_000006 transitioned from RUNNING to EXITED_WITH_SUCCESS
2013-04-16 03:29:58,644 INFO  [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(300))
- Cleaning up container container_1366096597608_0001_01_000005
2013-04-16 03:29:58,693 INFO  [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(300))
- Cleaning up container container_1366096597608_0001_01_000006
{code}
                
> getAllocatedContainers() is not returning all the allocated containers
> ----------------------------------------------------------------------
>
>                 Key: YARN-541
>                 URL: https://issues.apache.org/jira/browse/YARN-541
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.3-alpha
>         Environment: Redhat Linux 64-bit
>            Reporter: Krishna Kishore Bonagiri
>         Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out
>
>
> I am running an application that was written and working well with the hadoop-2.0.0-alpha
but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called
on AMResponse is not returning all the containers allocated sometimes. For example, I request
for 10 containers and this method gives me only 9 containers sometimes, and when I looked
at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes
randomly and works fine all other times. If I send one more request for the remaining container
to RM after it failed to give them the first time(and before releasing already acquired ones),
it could allocate that container. I am running only one application at a time, but 1000s of
them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested containers
are allocated,  the getAllocatedContainers() method is not returning me all of them, it returned
only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message