Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Content-Type: multipart/alternative;
 boundary="===============8077538067588280632=="
MIME-Version: 1.0
Subject: Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3
 stages [Spark Branch]
From: "Chao Sun" <chao.sun@cloudera.com>
To: "Xuefu Zhang" <xzhang@cloudera.com>, "hive" <dev@hive.apache.org>,
 "Chao Sun" <chao.sun@cloudera.com>
Date: Sun, 09 Nov 2014 05:56:57 -0000
Message-ID: <20141109055657.10454.884@reviews.apache.org>
Auto-Submitted: auto-generated
Sender: "Chao Sun" <noreply@reviews.apache.org>
References: <20141108151548.12275.3579@reviews.apache.org>
In-Reply-To: <20141108151548.12275.3579@reviews.apache.org>
Reply-To: "Chao Sun" <chao.sun@cloudera.com>

--===============8077538067588280632==
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit


> On Nov. 8, 2014, 3:15 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java, line 214
> > <https://reviews.apache.org/r/27627/diff/3/?file=754597#file754597line214>
> >
> >     This assumes that result SparkWorks will be linearly dependent on each other, which isn't true in general.Let's say the are two works (w1 and w2), each having a map join operator. w1 and w2 are connected to w3 via HTS. w3 also contains map join operator. Dependency in this scenario will be graphic rather than linear.
> 
> Chao Sun wrote:
>     I was thinking, in this case, if there's no dependency between w1 and w2, they can be put in the same SparkWork, right?
>     Otherwise, they will form a linear dependency too.
> 
> Xuefu Zhang wrote:
>     w1 and w2 are fine. they will be in the same SparkWork. This SparkWork will depends on both the SparkWork generated at w1 and SparkWork generated at w2. This dependency is not linear.
>     
>     To put more details, for each work that has map join op, we need to create a SparkWork to handle its small tables. So, both w1 and w2 will need to create such SparkWork. While w1 and w2 are in the same SparkWork, this SparkWork depends on the two SparkWorks created.

I'm not getting it, why "This dependency is not linear"? Can you give a counter example?
Suppose w1(MJ_1) w2(MJ_2), and w3(MJ_3) are like the following:

     HTS_1   HTS_2     HTS_3    HTS_4
       \      /           \     /
        \    /             \   /
          MJ_1              MJ_2
           |                 |
           |                 |
          HTS_5            HTS_6
              \            /
               \          /
                \        /
                 \      /
                  \    /
                    MJ_3
                    
Then, what I'm doing is to put HTS_1, HTS_2, HTS_3, and HTS_4 in the same SparkWork, say SW_1
then, MJ_1, MJ_2, HTS_5, and HTS_6 will be in another SparkWork SW_2, and MJ_3 in another SparkWork SW_3.
SW_1 -> SW_2 -> SW_3.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review60482
-----------------------------------------------------------


On Nov. 7, 2014, 6:07 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27627/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2014, 6:07 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-8622
>     https://issues.apache.org/jira/browse/HIVE-8622
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27627/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chao Sun
> 
>


--===============8077538067588280632==--