tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingda Chen (Jira)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-4317) Tez job can hang if new allocated container released because of speculative attempts avoid running on the same node
Date Tue, 06 Jul 2021 06:20:00 GMT

    [ https://issues.apache.org/jira/browse/TEZ-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375251#comment-17375251
] 

Yingda Chen commented on TEZ-4317:
----------------------------------

[~wei.wei], could you plase be more specifics as to what you believe is causing the problem?
or better yet, provides the relevant AM log to allow more analysis?

 

looking at the code, we do not identify how a job can hang because of avoidance of problematic
node for speculative attempt.

> Tez job can hang if new allocated container released because of speculative attempts
avoid running on the same node
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-4317
>                 URL: https://issues.apache.org/jira/browse/TEZ-4317
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.2
>            Reporter: wei
>            Priority: Major
>
> Assuming that a task attempt is running, eg: TA01.
> Then one speculated task attempt scheduled with allocated container same host with TA01,
this new allocated container will be released because of [TEZ-4042|https://issues.apache.org/jira/browse/TEZ-4042]
and no new resource request added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message