From issues-return-44448-archive-asf-public=cust-asf.ponee.io@tez.apache.org Tue Jul 6 06:20:04 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 1281B180643 for ; Tue, 6 Jul 2021 08:20:04 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 416493EC06 for ; Tue, 6 Jul 2021 06:20:03 +0000 (UTC) Received: (qmail 86593 invoked by uid 500); 6 Jul 2021 06:20:02 -0000 Mailing-List: contact issues-help@tez.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tez.apache.org Delivered-To: mailing list issues@tez.apache.org Received: (qmail 86536 invoked by uid 99); 6 Jul 2021 06:20:00 -0000 Received: from ec2-52-204-25-47.compute-1.amazonaws.com (HELO mailrelay1-ec2-va.apache.org) (52.204.25.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jul 2021 06:20:00 +0000 Received: from jira2-he-de.apache.org (jira2-he-de.apache.org [168.119.33.54]) by mailrelay1-ec2-va.apache.org (ASF Mail Server at mailrelay1-ec2-va.apache.org) with ESMTPS id C7CE13EA19 for ; Tue, 6 Jul 2021 06:20:00 +0000 (UTC) Received: from jira2-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira2-he-de.apache.org (ASF Mail Server at jira2-he-de.apache.org) with ESMTP id 0D69BC8054A for ; Tue, 6 Jul 2021 06:20:00 +0000 (UTC) Date: Tue, 6 Jul 2021 06:20:00 +0000 (UTC) From: "Yingda Chen (Jira)" To: issues@tez.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (TEZ-4317) Tez job can hang if new allocated container released because of speculative attempts avoid running on the same node MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/TEZ-4317?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D173752= 51#comment-17375251 ]=20 Yingda Chen commented on TEZ-4317: ---------------------------------- [~wei.wei], could you plase be more specifics as to what you believe is cau= sing the problem? or better yet, provides the relevant AM log to allow more= analysis? =C2=A0 looking at the code, we do not identify how a job can hang because of avoid= ance of problematic node for speculative attempt. > Tez job can hang if new allocated container released because of speculati= ve attempts avoid running on the same node > -------------------------------------------------------------------------= ------------------------------------------ > > Key: TEZ-4317 > URL: https://issues.apache.org/jira/browse/TEZ-4317 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.9.2 > Reporter: wei > Priority: Major > > Assuming that a task attempt is running, eg: TA01. > Then one speculated task attempt scheduled with allocated container same = host with TA01, this new allocated container will be released because of [T= EZ-4042|https://issues.apache.org/jira/browse/TEZ-4042] and no new resource= request added. -- This message was sent by Atlassian Jira (v8.3.4#803005)