hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Young Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8895) Improve YARN Error diagnostics
Date Fri, 19 Oct 2018 23:42:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657567#comment-16657567
] 

Young Chen commented on YARN-8895:
----------------------------------

Currently identifying error sources can be quite difficult, as they are written into an unstructured
string "diagnostics" field. This is present in container statuses returned to the RM and in
application attempts in the RM. These errors are difficult to classify without hard-coding diagnostic
string searches.

This Jira aims to add a structured error field in NM and RM that preserves failure information
and source component to enable faster and clearer error diagnosis.

Old error:

E.g.: 
Application application_1539325316309_0001 failed 1 times due to AM Container for appattempt_1539325316309_0001_000001
exited with exitCode: 57005
For more detailed output, check application tracking page:[http://XXXXXXXX:80/cluster/app/application_1539325316309_0001Then|http://xxxxxxxx/cluster/app/application_1539325316309_0001Then],
click on links to logs of each attempt.
Diagnostics: Container exited with a non-zero exit code 57005
Failing this attempt. Failing the application.
 
Proposed new error example:
{code:java}
{"errors":[{"errorId":"E_SYSTEM_AM_AMCRASHED",
"name":"AM_CRASHED","severity":"Error",
"component":"AM",
"source":"System",
"exitType":"CONTAINER_FINISHED","containerStatus":57005,
"description":"Application attempt appattempt_1539325316309_0001_000001 encountered an error",
"helpLink":"http://XXXXXXXXXXX:80/proxy/application_1539325316309_0001/"}]}
{code}
 
 

> Improve YARN  Error diagnostics
> -------------------------------
>
>                 Key: YARN-8895
>                 URL: https://issues.apache.org/jira/browse/YARN-8895
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Young Chen
>            Assignee: Young Chen
>            Priority: Minor
>
> Currently identifying error sources can be quite difficult, as they are written into
an unstructured string "diagnostics" field. This is present in container statuses returned
to the RM and in application attempts in the RM. These errors are difficult to classify without
hard-coding diagnostic string searches.
> This Jira aims to add a structured error field in NM and RM that preserves failure information
and source component to enable faster and clearer error diagnosis.
> Old error:
> E.g.: 
> Application application_1539325316309_0001 failed 1 times due to AM Container for appattempt_1539325316309_0001_000001
exited with exitCode: 57005
> For more detailed output, check application tracking page:http://XXXXXXXX:80/cluster/app/application_1539325316309_0001Then,
click on links to logs of each attempt.
> Diagnostics: Container exited with a non-zero exit code 57005
> Failing this attempt. Failing the application.
>  
> Proposed new error example:
> {code:java}
> {"errors":[{"errorId":"E_SYSTEM_AM_AMCRASHED",
> "name":"AM_CRASHED","severity":"Error",
> "component":"AM",
> "source":"System",
> "exitType":"CONTAINER_FINISHED","containerStatus":57005,
> "description":"Application attempt appattempt_1539325316309_0001_000001 encountered an
error",
> "helpLink":"http://XXXXXXXXXXX:80/proxy/application_1539325316309_0001/"}]}
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message