hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yesha Vora (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-7065) [RM UI] App status not getting updated in "All application" page
Date Fri, 08 Sep 2017 23:18:02 GMT

     [ https://issues.apache.org/jira/browse/YARN-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yesha Vora updated YARN-7065:
-----------------------------
    Attachment: Screen Shot 2017-09-08 at 4.14.51 PM.png
                Screen Shot 2017-09-08 at 4.15.07 PM.png

> [RM UI] App status not getting updated in "All application" page
> ----------------------------------------------------------------
>
>                 Key: YARN-7065
>                 URL: https://issues.apache.org/jira/browse/YARN-7065
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Yesha Vora
>         Attachments: Screen Shot 2017-09-08 at 4.14.51 PM.png, Screen Shot 2017-09-08
at 4.15.07 PM.png
>
>
> Scenario:
> 1) Run Spark Long Running application
> 2) Do RM and NN failover randomly
> 3) Validate App state in Yarn
> The Spark applications are finished. Yarn-cli returns correct status of yarn application.
> {code}
> [hrt_qa@xxx hadoopqe]$ yarn application -status application_1503203977699_0014
> 17/08/21 16:56:10 INFO client.AHSProxy: Connecting to Application History server at host1
xxx.xx.xx.x:10200
> 17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the
active RM in [rm1, rm2]...
> 17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM
[rm1]
> Application Report : 
> 	Application-Id : application_1503203977699_0014
> 	Application-Name : org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources
> 	Application-Type : SPARK
> 	User : hrt_qa
> 	Queue : default
> 	Application Priority : null
> 	Start-Time : 1503215983532
> 	Finish-Time : 1503250203806
> 	Progress : 0%
> 	State : FAILED
> 	Final-State : FAILED
> 	Tracking-URL : https://host1:8090/cluster/app/application_1503203977699_0014
> 	RPC Port : -1
> 	AM Host : N/A
> 	Aggregate Resource Allocation : 174722793 MB-seconds, 170603 vcore-seconds
> 	Log Aggregation Status : SUCCEEDED
> 	Diagnostics : Application application_1503203977699_0014 failed 20 times due to AM Container
for appattempt_1503203977699_0014_000020 exited with  exitCode: 1
> For more detailed output, check the application tracking page: https://host1:8090/cluster/app/application_1503203977699_0014
Then click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e04_1503203977699_0014_20_000001
> Exit code: 1
> Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
Launch container failed
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
> 	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Shell output: main : command provided 1
> main : run as user is hrt_qa
> main : requested yarn user is hrt_qa
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file /grid/0/hadoop/yarn/local/nmPrivate/application_1503203977699_0014/container_e04_1503203977699_0014_20_000001/container_e04_1503203977699_0014_20_000001.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Container exited with a non-zero exit code 1
> Failing this attempt. Failing the application.
> 	Unmanaged Application : false
> 	Application Node Label Expression : <Not set>
> 	AM container Node Label Expression : <DEFAULT_PARTITION>{code}
> However, RM UI "All application" page still shows the application in "RUNNING" State.
 
> https://host1:8090/cluster
> On clicking application_id ( https://host1:8090/cluster/app/application_1503203977699_0014)
, it redirects to application page and there it shows correct application state = Failed.

> The App status is not getting updated on Yarn All Application page. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message