Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 34801200D61 for ; Tue, 5 Dec 2017 01:10:17 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 330FB160C1B; Tue, 5 Dec 2017 00:10:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 772EF160BF9 for ; Tue, 5 Dec 2017 01:10:16 +0100 (CET) Received: (qmail 94243 invoked by uid 500); 5 Dec 2017 00:10:15 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 94234 invoked by uid 99); 5 Dec 2017 00:10:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Dec 2017 00:10:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C66C0C016B for ; Tue, 5 Dec 2017 00:10:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.211 X-Spam-Level: X-Spam-Status: No, score=-99.211 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id UkTM2boVamYe for ; Tue, 5 Dec 2017 00:10:13 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 8F6C65F2C4 for ; Tue, 5 Dec 2017 00:10:13 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AA5E7E0161 for ; Tue, 5 Dec 2017 00:10:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id DF1A1255C5 for ; Tue, 5 Dec 2017 00:10:00 +0000 (UTC) Date: Tue, 5 Dec 2017 00:10:00 +0000 (UTC) From: "Apache Spark (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-22162) Executors and the driver use inconsistent Job IDs during the new RDD commit protocol MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 05 Dec 2017 00:10:17 -0000 [ https://issues.apache.org/jira/browse/SPARK-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277795#comment-16277795 ] Apache Spark commented on SPARK-22162: -------------------------------------- User 'rezasafi' has created a pull request for this issue: https://github.com/apache/spark/pull/19886 > Executors and the driver use inconsistent Job IDs during the new RDD commit protocol > ------------------------------------------------------------------------------------ > > Key: SPARK-22162 > URL: https://issues.apache.org/jira/browse/SPARK-22162 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.0 > Reporter: Reza Safi > Assignee: Reza Safi > Fix For: 2.3.0 > > > After SPARK-18191 commit in pull request 15769, using the new commit protocol it is possible that driver and executors uses different jobIds during a rdd commit. > In the old code, the variable stageId is part of the closure used to define the task as you can see here: > [https://github.com/apache/spark/blob/9c8deef64efee20a0ddc9b612f90e77c80aede60/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L1098] > As a result, a TaskAttemptId is constructed in executors using the same "stageId" as the driver, since it is a value that is serialized in the driver. Also the value of stageID is actually the rdd.id which is assigned here: [https://github.com/apache/spark/blob/9c8deef64efee20a0ddc9b612f90e77c80aede60/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L1084] > However, after the change in pull request 15769, the value is no longer part of the task closure, which gets serialized by the driver. Instead, it is pulled from the taskContext as you can see here:[https://github.com/apache/spark/pull/15769/files#diff-dff185cb90c666bce445e3212a21d765R103] > and then that value is used to construct the TaskAttemptId on the executors: [https://github.com/apache/spark/pull/15769/files#diff-dff185cb90c666bce445e3212a21d765R134] > taskContext has a stageID value which will be set in DAGScheduler. So after the change unlike the old code which a rdd.id was used, an actual stage.id is used which can be different between executors and the driver since it is no longer serialized. > In summary, the old code consistently used rddId, and just incorrectly named it "stageId". > The new code uses a mix of rddId and stageId. There should be a consistent ID between executors and the drivers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org