hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yesha Vora (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-7065) [RM UI] App status not getting updated in "All application" page
Date Mon, 21 Aug 2017 21:17:00 GMT
Yesha Vora created YARN-7065:

             Summary: [RM UI] App status not getting updated in "All application" page
                 Key: YARN-7065
                 URL: https://issues.apache.org/jira/browse/YARN-7065
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Yesha Vora

1) Run Spark Long Running application
2) Do RM and NN failover randomly
3) Validate App state in Yarn

The Spark applications are finished. Yarn-cli returns correct status of yarn application.
[hrt_qa@xxx hadoopqe]$ yarn application -status application_1503203977699_0014
17/08/21 16:56:10 INFO client.AHSProxy: Connecting to Application History server at host1
17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active
RM in [rm1, rm2]...
17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
Application Report : 
	Application-Id : application_1503203977699_0014
	Application-Name : org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources
	Application-Type : SPARK
	User : hrt_qa
	Queue : default
	Application Priority : null
	Start-Time : 1503215983532
	Finish-Time : 1503250203806
	Progress : 0%
	State : FAILED
	Final-State : FAILED
	Tracking-URL : https://host1:8090/cluster/app/application_1503203977699_0014
	RPC Port : -1
	AM Host : N/A
	Aggregate Resource Allocation : 174722793 MB-seconds, 170603 vcore-seconds
	Log Aggregation Status : SUCCEEDED
	Diagnostics : Application application_1503203977699_0014 failed 20 times due to AM Container
for appattempt_1503203977699_0014_000020 exited with  exitCode: 1
For more detailed output, check the application tracking page: https://host1:8090/cluster/app/application_1503203977699_0014
Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e04_1503203977699_0014_20_000001
Exit code: 1
Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
Launch container failed
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Shell output: main : command provided 1
main : run as user is hrt_qa
main : requested yarn user is hrt_qa
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /grid/0/hadoop/yarn/local/nmPrivate/application_1503203977699_0014/container_e04_1503203977699_0014_20_000001/container_e04_1503203977699_0014_20_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...

Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
	Unmanaged Application : false
	Application Node Label Expression : <Not set>
	AM container Node Label Expression : <DEFAULT_PARTITION>{code}

However, RM UI "All application" page still shows the application in "RUNNING" State.  
On clicking application_id ( https://host1:8090/cluster/app/application_1503203977699_0014)
, it redirects to application page and there it shows correct application state = Failed.

The App status is not getting updated on Yarn All Application page. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message