hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang" <jxi...@cloudera.com>
Subject Re: Review Request 27745: HIVE-8621 Dump small table join data for map-join [Spark Branch]
Date Fri, 07 Nov 2014 23:54:50 GMT


> On Nov. 7, 2014, 9:51 p.m., Suhas Satish wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java, line 314
> > <https://reviews.apache.org/r/27745/diff/1/?file=754765#file754765line314>
> >
> >     What if there are 2 partitions for big table?  I guess they will then be processed
on 2 separate spark nodes, right?  
> >     
> >     So in this case, there are 2 replicas created for this HashTableSink. How do
we control that these 2 replicas will be on the same data nodes as the ones where the 2 big
table partitions will be processing map-joins ?

We can't, if we don't know where the big table partitions are.  If there are just two partitions,
if we copy the small table to more nodes, it may take more time, than fetch the data over
network?


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27745/#review60388
-----------------------------------------------------------


On Nov. 7, 2014, 9:34 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27745/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2014, 9:34 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8621
>     https://issues.apache.org/jira/browse/HIVE-8621
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In case spark, HashTableSinkOperator should dump files to a folder expected by HashTableLoader.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java f0e04e7 
> 
> Diff: https://reviews.apache.org/r/27745/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message