tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-2311) AM can hang if kill received while recovering from previous attempt
Date Thu, 23 Jul 2015 20:10:04 GMT

    [ https://issues.apache.org/jira/browse/TEZ-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639432#comment-14639432

Jeff Zhang commented on TEZ-2311:

bq. The DAGImpl change does not seem right. When vertex A is killed, why is Vertex B being
killed by the DAG? The DAG should be triggering a kill for all vertices or a sub-set of them
on certain conditions. Adding this code creates a loop of events.
The kill would only happen one time. Because we would check dag's terminationCause before
killing vertices.
  void enactKill(DAGTerminationCause dagTerminationCause,
      VertexTerminationCause vertexTerminationCause) {

      for (Vertex v : vertices.values()) {
            new VertexEventTermination(v.getVertexId(), vertexTerminationCause)

Post another patch to fix the null checking of vertex.tasks

> AM can hang if kill received while recovering from previous attempt
> -------------------------------------------------------------------
>                 Key: TEZ-2311
>                 URL: https://issues.apache.org/jira/browse/TEZ-2311
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Jason Lowe
>            Assignee: Jeff Zhang
>              Labels: Recovery
>         Attachments: TEZ-2311-1.patch, TEZ-2311-2.patch
> We saw an instance of a Tez job hanging despite receiving multiple kill requests from
clients.  The AM was recovering from a prior attempt when the first kill request arrived.

This message was sent by Atlassian JIRA

View raw message