hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Yan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6008) Fetch container list for failed application attempt
Date Tue, 03 Jan 2017 06:20:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794283#comment-15794283
] 

David Yan commented on YARN-6008:
---------------------------------

[~sunilg] In my case, the current attempt is actually not reusing the containers from the
previous attempt. The previous attempt has containers that are terminated as a result of an
AM failure. After the entire application is killed, the call actually returns the right containers.
That's why I think the current behavior does not make sense.

Here's an example:

Application has one failed attempt, and the current attempt is 02. This call that asks for
containers from the previous attempt (01) returns the containers of the current attempt (02):

{code}
$ ~/hadoop/bin/yarn container -list appattempt_1483391433545_0002_000001
17/01/02 22:15:43 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
17/01/02 22:15:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/01/02 22:15:43 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
Total number of containers :4
                  Container-Id	          Start Time	         Finish Time	               State
                Host	   Node Http Address	                            LOG-URL
container_1483391433545_0002_02_000005	Mon Jan 02 19:17:02 -0800 2017	                 N/A
             RUNNING	  david-ubuntu:37617	http://david-ubuntu:8042	http://david-ubuntu:8042/node/containerlogs/container_1483391433545_0002_02_000005/david
container_1483391433545_0002_02_000004	Mon Jan 02 19:17:01 -0800 2017	                 N/A
             RUNNING	  david-ubuntu:37617	http://david-ubuntu:8042	http://david-ubuntu:8042/node/containerlogs/container_1483391433545_0002_02_000004/david
container_1483391433545_0002_02_000002	Mon Jan 02 19:16:59 -0800 2017	                 N/A
             RUNNING	  david-ubuntu:37617	http://david-ubuntu:8042	http://david-ubuntu:8042/node/containerlogs/container_1483391433545_0002_02_000002/david
container_1483391433545_0002_02_000001	Mon Jan 02 19:16:48 -0800 2017	                 N/A
             RUNNING	  david-ubuntu:37617	http://david-ubuntu:8042	http://david-ubuntu:8042/node/containerlogs/container_1483391433545_0002_02_000001/david
{code}

Now, let's kill the application:

{code}
$ ~/hadoop/bin/yarn application -kill application_1483391433545_0002
17/01/02 22:18:39 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
17/01/02 22:18:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/01/02 22:18:39 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
Killing application application_1483391433545_0002
17/01/02 22:18:39 INFO impl.YarnClientImpl: Killed application application_1483391433545_0002
{code}

Now, let's execute the previous call again that asks for the containers from attempt 01:

{code}
$ ~/hadoop/bin/yarn container -list appattempt_1483391433545_0002_000001
17/01/02 22:19:28 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
17/01/02 22:19:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/01/02 22:19:29 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
Total number of containers :4
                  Container-Id	          Start Time	         Finish Time	               State
                Host	   Node Http Address	                            LOG-URL
container_1483391433545_0002_01_000004	Mon Jan 02 13:29:08 -0800 2017	Mon Jan 02 19:16:47
-0800 2017	            COMPLETE	  david-ubuntu:37617	                 N/A	http://0.0.0.0:8188/applicationhistory/logs/david-ubuntu:37617/container_1483391433545_0002_01_000004/container_1483391433545_0002_01_000004/david
container_1483391433545_0002_01_000003	Mon Jan 02 13:29:07 -0800 2017	Mon Jan 02 19:16:45
-0800 2017	            COMPLETE	  david-ubuntu:37617	                 N/A	http://0.0.0.0:8188/applicationhistory/logs/david-ubuntu:37617/container_1483391433545_0002_01_000003/container_1483391433545_0002_01_000003/david
container_1483391433545_0002_01_000002	Mon Jan 02 13:29:06 -0800 2017	Mon Jan 02 19:16:47
-0800 2017	            COMPLETE	  david-ubuntu:37617	                 N/A	http://0.0.0.0:8188/applicationhistory/logs/david-ubuntu:37617/container_1483391433545_0002_01_000002/container_1483391433545_0002_01_000002/david
container_1483391433545_0002_01_000001	Mon Jan 02 13:28:56 -0800 2017	Mon Jan 02 19:16:47
-0800 2017	            COMPLETE	  david-ubuntu:37617	                 N/A	http://0.0.0.0:8188/applicationhistory/logs/david-ubuntu:37617/container_1483391433545_0002_01_000001/container_1483391433545_0002_01_000001/david
{code}


As you can see, after the application has been killed, it has the right behavior, but not
when the application is running. That's why I think it does not make sense.

Hope the above helps!

> Fetch container list for failed application attempt
> ---------------------------------------------------
>
>                 Key: YARN-6008
>                 URL: https://issues.apache.org/jira/browse/YARN-6008
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: hadoop version 2.6
>            Reporter: Priyanka Gugale
>
> When we run command "yarn container -list" on using failed application attempt we should
either get containers from that attempt or get a back list as containers are no longer in
running state.
> Steps to reproduce:
> 1. Launch a yarn application. 
> 2. Kill app master, it tries to restart application with new attempt id. 
> 3. Now run yarn command,
> yarn container -list <Application Attempt ID>
> Where Application Attempt ID is of failed attempt, 
> it lists the container from next attempt which is in "RUNNING" state right now.
> Expected behavior:
> It should return list of killed containers from attempt 1 or empty list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message