tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-1267) Exception handling for VertexManager
Date Thu, 23 Oct 2014 01:19:34 GMT

    [ https://issues.apache.org/jira/browse/TEZ-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180863#comment-14180863
] 

Bikas Saha commented on TEZ-1267:
---------------------------------

For code like this and other would it have made sense to put all the logging and tryEnactKill
code into a common method to reduce code verbosity and duplication? VertexManager code is
all Tez code. Maybe it could do all of this and return a diagnostic object back because the
real try-catch is in there around user code. Instead of the Tez verteximpl code do a try-catch
around Tez vertexmanager code.
{code}
-    vertexManager.onVertexStarted(pendingReportedSrcCompletions);
+    try {
+      vertexManager.onVertexStarted(pendingReportedSrcCompletions);
+    } catch (AMUserCodeException e) {
+      String msg = "Exception in " + e.getSource() +", vertex=" + logIdentifier;
+      LOG.error(msg, e);
+      addDiagnostic(msg + "," + ExceptionUtils.getStackTrace(e.getCause()));
+      tryEnactKill(VertexTerminationCause.AM_USERCODE_FAILURE, TaskTerminationCause.AM_USERCODE_FAILURE);
+      return VertexState.TERMINATING;
+    }{code}

> Exception handling for VertexManager
> ------------------------------------
>
>                 Key: TEZ-1267
>                 URL: https://issues.apache.org/jira/browse/TEZ-1267
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>             Fix For: 0.5.2
>
>         Attachments: TEZ-1267-2.patch, TEZ-1267-3.patch, TEZ-1267-4.patch, Tez-1267.patch
>
>
> Events are generated by user code. In some places they're also handled by user code within
the AM. Currently, exceptions which are generated when handling user code will end up killing
the AM (and hence leading to a retry).
> Instead, failure to handle such events, should cause the application to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message