hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho" <>
Subject Re: Review Request 24919: HIVE-7815 : Reduce Side Join with single reducer [Spark Branch]
Date Thu, 21 Aug 2014 22:44:57 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Aug. 21, 2014, 10:44 p.m.)

Review request for hive and Brock Noland.


Thanks Brock for the suggestion.  Nope I dont mind, happy to do more unrelated cleanup of
that class.

Bugs: HIVE-7815

Repository: hive-git


This is the first part of the reduce-side join work.  See HIVE-7384 for the overall design

This patch inserts a UnionTran after the two join inputs, and thus leverages the Union-all
code path to run the Spark RDD.  I also made the following changes:

1.  Some API cleanup of GraphTran.  Connect will automatically add the child, so no need for
multiple calls.
2.  Fix a bug in HiveBaseReduceFunction.  HIVE-7652 made the iterator return false after close
if there's more rows, so Spark calls hasNext again and close thus gets called twice.  CommonJoinOperator
throws exception if close gets called more than once.  So adding a check there. 

Diffs (updated)

  itests/src/test/resources/ 63af01d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ 03f0ff8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ 6568a76

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ d16f1be 
  ql/src/test/results/clientpositive/spark/join0.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/join1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/join_casesensitive.q.out PRE-CREATION 



Added three join tests to the TestSparkCliDriver suite.


Szehon Ho

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message