hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prabhu Joseph (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries
Date Sat, 02 Mar 2019 11:07:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782340#comment-16782340
] 

Prabhu Joseph commented on YARN-8132:
-------------------------------------

[~Rakesh_Shah] It gets triggered at {{RMAppAttemptEventType.FAIL}} when yarn client calls
failApplicationAttempt (yarn application -fail).  When job fails due to tasks failing, the
AM UnregisterEvent will set finalApplicationStatus to FAILED. But have missed for failure
cases like AM Crash, AM Expire where AM UnregisterEvent won't be present.

[~bibinchundatt] The given fix works for Killed cases (including job timeout) and failure
cases like - Tasks failing, Client initiates failApplicationAttempt but did not for AM Crash
and AM Expire. Can we handle in a separate Jira or continue with this one.



> Final Status of applications shown as UNDEFINED in ATS app queries
> ------------------------------------------------------------------
>
>                 Key: YARN-8132
>                 URL: https://issues.apache.org/jira/browse/YARN-8132
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: ATSv2, timelineservice
>            Reporter: Charan Hebri
>            Assignee: Prabhu Joseph
>            Priority: Major
>             Fix For: 3.3.0, 3.2.1, 3.1.3
>
>         Attachments: YARN-8132-001.patch, YARN-8132-002.patch, YARN-8132-003.patch, YARN-8132-004.patch,
YARN-8132-branch-3.1.001.patch, YARN-8132-branch-3.2.001.patch, YARN-8132-branch-3.2.002.patch
>
>
> Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A sample
request/response with INFO field for an application,
> {noformat}
> 2018-04-09 13:10:02,126 INFO  reader.TimelineReaderWebServices (TimelineReaderWebServices.java:getApp(1693))
- Received URL /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user hrt_qa
> 2018-04-09 13:10:02,156 INFO  reader.TimelineReaderWebServices (TimelineReaderWebServices.java:getApp(1716))
- Processed URL /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 ms.){noformat}
> {noformat}
> {
>   "metrics": [],
>   "events": [],
>   "createdtime": 1523263360719,
>   "idprefix": 0,
>   "id": "application_1523259757659_0003",
>   "type": "YARN_APPLICATION",
>   "info": {
>     "YARN_APPLICATION_CALLER_CONTEXT": "CLI",
>     "YARN_APPLICATION_DIAGNOSTICS_INFO": "Application application_1523259757659_0003
was killed by user xxx_xx at XXX.XXX.XXX.XXX",
>     "YARN_APPLICATION_FINAL_STATUS": "UNDEFINED",
>     "YARN_APPLICATION_NAME": "Sleep job",
>     "YARN_APPLICATION_USER": "hrt_qa",
>     "YARN_APPLICATION_UNMANAGED_APPLICATION": false,
>     "FROM_ID": "yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003",
>     "UID": "yarn-cluster!application_1523259757659_0003",
>     "YARN_APPLICATION_VIEW_ACLS": " ",
>     "YARN_APPLICATION_SUBMITTED_TIME": 1523263360718,
>     "YARN_AM_CONTAINER_LAUNCH_COMMAND": [
>       "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
-Dhadoop.root.logfile=syslog -Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
"
>     ],
>     "YARN_APPLICATION_QUEUE": "default",
>     "YARN_APPLICATION_TYPE": "MAPREDUCE",
>     "YARN_APPLICATION_PRIORITY": 0,
>     "YARN_APPLICATION_LATEST_APP_ATTEMPT": "appattempt_1523259757659_0003_000001",
>     "YARN_APPLICATION_TAGS": [
>       "timeline_flow_name_tag:test_flow"
>     ],
>     "YARN_APPLICATION_STATE": "KILLED"
>   },
>   "configs": {},
>   "isrelatedto": {},
>   "relatesto": {}
> }{noformat}
> This is different to what the Resource Manager reports. For KILLED applications the
final status is KILLED and for FAILED applications it is FAILED. This behavior is seen in
ATSv2 as well as older versions of ATS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message