hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Young Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8895) Improve YARN Error diagnostics
Date Mon, 29 Oct 2018 18:01:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667544#comment-16667544
] 

Young Chen commented on YARN-8895:
----------------------------------

Hi [~leftnoteasy] -

1 - I agree that we should make this a new field and leave diagnostics as is until we have
equivalent or better functionality with the structured errors.

2 - The most important changes will be:
 * code in NM and the RM to construct these structured errors as close to the source as possible
 * Protobuf changes for the NM to communicate these errors to the RM, maybe in container status
reports if the exit was abnormal
 * Update RM failover to save these structured errors during restarts

As for details, I'm still debating whether a pluggable structured error implementation would
be worth it - I think error structures are more or less very similar. Error code, message,
description, source component, user/system, etc..

What do you think?

 

> Improve YARN  Error diagnostics
> -------------------------------
>
>                 Key: YARN-8895
>                 URL: https://issues.apache.org/jira/browse/YARN-8895
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Young Chen
>            Assignee: Young Chen
>            Priority: Minor
>
> Currently identifying error sources can be quite difficult, as they are written into
an unstructured string "diagnostics" field. This is present in container statuses returned
to the RM and in application attempts in the RM. These errors are difficult to classify without
hard-coding diagnostic string searches.
> This Jira aims to add a structured error field in NM and RM that preserves failure information
and source component to enable faster and clearer error diagnosis
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message