From issues-return-131373-archive-asf-public=cust-asf.ponee.io@hive.apache.org  Fri Aug  3 18:48:05 2018
Return-Path: <issues-return-131373-archive-asf-public=cust-asf.ponee.io@hive.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id E814C18067B
	for <archive-asf-public@cust-asf.ponee.io>; Fri,  3 Aug 2018 18:48:04 +0200 (CEST)
Received: (qmail 93421 invoked by uid 500); 3 Aug 2018 16:48:04 -0000
Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@hive.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@hive.apache.org>
List-Post: <mailto:issues@hive.apache.org>
List-Id: <issues.hive.apache.org>
Reply-To: dev@hive.apache.org
Delivered-To: mailing list issues@hive.apache.org
Received: (qmail 93412 invoked by uid 99); 3 Aug 2018 16:48:04 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2018 16:48:04 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 98F11180A18
	for <issues@hive.apache.org>; Fri,  3 Aug 2018 16:48:03 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -107.498
X-Spam-Level:
X-Spam-Status: No, score=-107.498 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8,
	KAM_BADIPHTTP=2, KAM_SHORT=0.001, NORMAL_HTTP_TO_IP=0.001,
	RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5,
	USER_IN_WHITELIST=-100, WEIRD_PORT=0.001] autolearn=disabled
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id x4-OLKFwzYVG for <issues@hive.apache.org>;
	Fri,  3 Aug 2018 16:48:02 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 666195F41B
	for <issues@hive.apache.org>; Fri,  3 Aug 2018 16:48:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8B2E9E2072
	for <issues@hive.apache.org>; Fri,  3 Aug 2018 16:48:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 412EC23F9A
	for <issues@hive.apache.org>; Fri,  3 Aug 2018 16:48:00 +0000 (UTC)
Date: Fri, 3 Aug 2018 16:48:00 +0000 (UTC)
From: "Hui Huang (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13176667.1533288062000.7992.1533314880265@Atlassian.JIRA>
In-Reply-To: <JIRA.13176667.1533288062000@Atlassian.JIRA>
References: <JIRA.13176667.1533288062000@Atlassian.JIRA> <JIRA.13176667.1533288062868@jira-lw-us.apache.org>
Subject: [jira] [Updated] (HIVE-20304) When hive.optimize.skewjoin and
 hive.auto.convert.join are both set to true, and the execution engine is
 mr, same stage may launch twice due to the wrong generated plan
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


     [ https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hui Huang updated HIVE-20304:
-----------------------------
    Attachment: HIVE-20304.1.patch

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20304
>                 URL: https://issues.apache.org/jira/browse/HIVE-20304
>             Project: Hive
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 1.2.1, 2.3.3
>            Reporter: Hui Huang
>            Assignee: Hui Huang
>            Priority: Major
>             Fix For: 1.2.1, 2.3.3
>
>         Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is set to mr, same stage of a query may launch twice due to the wrong generated plan. If hive.exec.parallel is also true, the same stage will launch at the same time and the job will failed due to the first completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>                platform,
>                groupname,
>                count(1) as cnt
>         FROM
>         (SELECT dt,
>                 platform,
>                 groupname
>          FROM
>          (SELECT fence
>           FROM tbl1)ta
>            JOIN
>            (SELECT a0.dt,
>                    a1.platform,
>                    a2.groupname,
>                    a3.fence
>             FROM
>             (SELECT dt,
>                     order_id,
>                     phone,
>                     search_id
>              FROM tbl2
>              WHERE dt =20180703 )a0
>               JOIN
>               (SELECT order_id,
>                       platform,
>                       dt
>                FROM tbl3
>                WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>               INNER JOIN
>               (SELECT groupname,
>                       phone,
>                       dt
>                FROM tbl4
>                WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>               LEFT JOIN
>               (SELECT search_id,
>                       fence,
>                       dt
>                FROM tbl5
>                WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON ta.fence = t0.fence)t11
>         GROUP BY dt,
>                  platform,
>                  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_000000 (and more) from job job_1531284442065_3637
> Task with the most failures(4):
> -----
> Task ID:
>   task_1531284442065_3637_m_000000
> URL:
>   http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637&tipid=task_1531284442065_3637_m_000000
> -----
> Diagnostic Messages for this Task:
> File does not exist: hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of Stage-31, Stage-2
>   Stage-31
>   Stage-9 depends on stages: Stage-31
>   Stage-2 depends on stages: Stage-9
>   Stage-33 has a backup stage: Stage-1
>   Stage-16 depends on stages: Stage-33
>   Stage-1
>   Stage-5
>   Stage-27 is a root stage , consists of Stage-37, Stage-38, Stage-4
>   Stage-37 has a backup stage: Stage-4
>   Stage-25 depends on stages: Stage-37
>   Stage-12 depends on stages: Stage-4, Stage-22, Stage-23, Stage-25, Stage-26 , consists of Stage-36, Stage-5
>   Stage-36
>   Stage-11 depends on stages: Stage-36
>   Stage-19 depends on stages: Stage-11 , consists of Stage-35, Stage-5
>   Stage-35 has a backup stage: Stage-5
>   Stage-18 depends on stages: Stage-35
>   Stage-38 has a backup stage: Stage-4
>   Stage-26 depends on stages: Stage-38
>   Stage-4
>   Stage-30 is a root stage , consists of Stage-42, Stage-43, Stage-3
>   Stage-42 has a backup stage: Stage-3
>   Stage-28 depends on stages: Stage-42
>   Stage-14 depends on stages: Stage-3, Stage-28, Stage-29 , consists of Stage-41, Stage-4
>   Stage-41
>   Stage-13 depends on stages: Stage-41
>   Stage-24 depends on stages: Stage-13 , consists of Stage-39, Stage-40, Stage-4
>   Stage-39 has a backup stage: Stage-4
>   Stage-22 depends on stages: Stage-39
>   Stage-40 has a backup stage: Stage-4
>   Stage-23 depends on stages: Stage-40
>   Stage-43 has a backup stage: Stage-3
>   Stage-29 depends on stages: Stage-43
>   Stage-3
>   Stage-0 depends on stages: Stage-2
> {code}
> After skewjoin optimization, the processed node is added into the listTasks of ConditionalTask and the parentTask of the processed node is removed and during the commonJoin optimization of listTasks of ConditionalTask, the new generated condTask will be added into root task list due to parentTask is null.
> workaround: do not set hive.optimize.skewjoin and hive.auto.convert.join to true at the same time.


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)