Return-Path: X-Original-To: apmail-tez-issues-archive@minotaur.apache.org Delivered-To: apmail-tez-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFB9618509 for ; Tue, 23 Feb 2016 23:10:52 +0000 (UTC) Received: (qmail 86622 invoked by uid 500); 23 Feb 2016 23:10:18 -0000 Delivered-To: apmail-tez-issues-archive@tez.apache.org Received: (qmail 86561 invoked by uid 500); 23 Feb 2016 23:10:18 -0000 Mailing-List: contact issues-help@tez.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tez.apache.org Delivered-To: mailing list issues@tez.apache.org Received: (qmail 86545 invoked by uid 99); 23 Feb 2016 23:10:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Feb 2016 23:10:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 0FF982C14F0 for ; Tue, 23 Feb 2016 23:10:18 +0000 (UTC) Date: Tue, 23 Feb 2016 23:10:18 +0000 (UTC) From: "Bikas Saha (JIRA)" To: issues@tez.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (TEZ-3102) Fetch failure of a speculated task causes job hang MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159835#comment-15159835 ] Bikas Saha commented on TEZ-3102: --------------------------------- +1. I think testTaskSucceedAndRetroActiveFailure() should be covering the new code changes in the success attempt code path. In the small chance that its not, would you please update the test. Thanks! > Fetch failure of a speculated task causes job hang > -------------------------------------------------- > > Key: TEZ-3102 > URL: https://issues.apache.org/jira/browse/TEZ-3102 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Critical > Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch > > > If a task speculates then succeeds, one task will be marked successful and the other killed. Then if the task retroactively fails due to fetch failures the Tez AM will fail to reschedule another task. This results in a hung job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)