hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suhas Satish" <suhas.sat...@gmail.com>
Subject Re: Review Request 27640: HIVE-8700 Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]
Date Thu, 06 Nov 2014 03:26:10 GMT


> On Nov. 5, 2014, 9:23 p.m., Szehon Ho wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
line 254
> > <https://reviews.apache.org/r/27640/diff/1/?file=750693#file750693line254>
> >
> >     Are you sure we dont need to initialize the HTSOperator's values like it does
in LocalMapJoinProcFactory?
> 
> Suhas Satish wrote:
>     I will take a closer look.

I dug into the history of this changeset a bit. 
It was introduced in this commit 
https://github.com/apache/hive/commit/9b4ba6a9bb2a1184857fc8cca11e3dc6c48c1380

>From one of the comments on HIVE-4867, 
there is a problem in mapjoin on tez. MR compiler replaces RS with HashSink made from value
exprs of Join but Tez compiler uses RS as is,  assuming it has same columns with value exprs
of Join, which is not true

HIVE-4867 dedups columns in RS for reducer join and RS for order-by. But small aliases of
mapjoin of MR tasks still contains key columns in value exprs.
 
Not having this can at worst, be a performance issue on memory (slightly larger footprint)
but not impact functionality.


- Suhas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27640/#review60031
-----------------------------------------------------------


On Nov. 5, 2014, 8:29 p.m., Suhas Satish wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27640/
> -----------------------------------------------------------
> 
> (Updated Nov. 5, 2014, 8:29 p.m.)
> 
> 
> Review request for hive, Chao Sun, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This replaces ReduceSinks with HashTableSinks in smaller tables for a map-join. But the
condition check field to detect map-join is actually being set in CommonJoinResolver, which
doesnt exist yet. We need to decide where is the right place to populate this field. 
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 795a5d7 
> 
> Diff: https://reviews.apache.org/r/27640/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Suhas Satish
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message