Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D4A3317DC5 for ; Sun, 9 Nov 2014 05:56:54 +0000 (UTC) Received: (qmail 92083 invoked by uid 500); 9 Nov 2014 05:56:54 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 92004 invoked by uid 500); 9 Nov 2014 05:56:54 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 91985 invoked by uid 99); 9 Nov 2014 05:56:54 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Nov 2014 05:56:54 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 5103F1DFD5D; Sun, 9 Nov 2014 05:56:57 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8077538067588280632==" MIME-Version: 1.0 Subject: Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch] From: "Chao Sun" To: "Xuefu Zhang" , "hive" , "Chao Sun" Date: Sun, 09 Nov 2014 05:56:57 -0000 Message-ID: <20141109055657.10454.884@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Chao Sun" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/27627/ X-Sender: "Chao Sun" References: <20141108151548.12275.3579@reviews.apache.org> In-Reply-To: <20141108151548.12275.3579@reviews.apache.org> Reply-To: "Chao Sun" X-ReviewRequest-Repository: hive-git --===============8077538067588280632== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Nov. 8, 2014, 3:15 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java, line 214 > > > > > > This assumes that result SparkWorks will be linearly dependent on each other, which isn't true in general.Let's say the are two works (w1 and w2), each having a map join operator. w1 and w2 are connected to w3 via HTS. w3 also contains map join operator. Dependency in this scenario will be graphic rather than linear. > > Chao Sun wrote: > I was thinking, in this case, if there's no dependency between w1 and w2, they can be put in the same SparkWork, right? > Otherwise, they will form a linear dependency too. > > Xuefu Zhang wrote: > w1 and w2 are fine. they will be in the same SparkWork. This SparkWork will depends on both the SparkWork generated at w1 and SparkWork generated at w2. This dependency is not linear. > > To put more details, for each work that has map join op, we need to create a SparkWork to handle its small tables. So, both w1 and w2 will need to create such SparkWork. While w1 and w2 are in the same SparkWork, this SparkWork depends on the two SparkWorks created. I'm not getting it, why "This dependency is not linear"? Can you give a counter example? Suppose w1(MJ_1) w2(MJ_2), and w3(MJ_3) are like the following: HTS_1 HTS_2 HTS_3 HTS_4 \ / \ / \ / \ / MJ_1 MJ_2 | | | | HTS_5 HTS_6 \ / \ / \ / \ / \ / MJ_3 Then, what I'm doing is to put HTS_1, HTS_2, HTS_3, and HTS_4 in the same SparkWork, say SW_1 then, MJ_1, MJ_2, HTS_5, and HTS_6 will be in another SparkWork SW_2, and MJ_3 in another SparkWork SW_3. SW_1 -> SW_2 -> SW_3. - Chao ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27627/#review60482 ----------------------------------------------------------- On Nov. 7, 2014, 6:07 p.m., Chao Sun wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27627/ > ----------------------------------------------------------- > > (Updated Nov. 7, 2014, 6:07 p.m.) > > > Review request for hive. > > > Bugs: HIVE-8622 > https://issues.apache.org/jira/browse/HIVE-8622 > > > Repository: hive-git > > > Description > ------- > > This is a sub-task of map-join for spark > https://issues.apache.org/jira/browse/HIVE-7613 > This can use the baseline patch for map-join > https://issues.apache.org/jira/browse/HIVE-8616 > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 > > Diff: https://reviews.apache.org/r/27627/diff/ > > > Testing > ------- > > > Thanks, > > Chao Sun > > --===============8077538067588280632==--