tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-3102) Fetch failure of a speculated task causes job hang
Date Wed, 24 Feb 2016 20:03:18 GMT

     [ https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated TEZ-3102:
    Attachment: TEZ-3102.003.patch

Thanks for the reviews, Bikas!

testTaskSucceedAndRetroActiveFailure doesn't cover the change since it's using the failed
transition rather than the killed transition, so I added a test that explicitly kills a successful
attempt to verify it reverts back to scheduling a new attempt.

The reported test failures appear to be unrelated, as they pass for me locally.

> Fetch failure of a speculated task causes job hang
> --------------------------------------------------
>                 Key: TEZ-3102
>                 URL: https://issues.apache.org/jira/browse/TEZ-3102
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch, TEZ-3102.003.patch
> If a task speculates then succeeds, one task will be marked successful and the other
killed. Then if the task retroactively fails due to fetch failures the Tez AM will fail to
reschedule another task. This results in a hung job.

This message was sent by Atlassian JIRA

View raw message