hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho" <sze...@cloudera.com>
Subject Re: Review Request 24919: HIVE-7815 : Reduce Side Join with single reducer [Spark Branch]
Date Thu, 21 Aug 2014 22:44:57 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24919/
-----------------------------------------------------------

(Updated Aug. 21, 2014, 10:44 p.m.)


Review request for hive and Brock Noland.


Changes
-------

Thanks Brock for the suggestion.  Nope I dont mind, happy to do more unrelated cleanup of
that class.


Bugs: HIVE-7815
    https://issues.apache.org/jira/browse/HIVE-7815


Repository: hive-git


Description
-------

This is the first part of the reduce-side join work.  See HIVE-7384 for the overall design
doc.

This patch inserts a UnionTran after the two join inputs, and thus leverages the Union-all
code path to run the Spark RDD.  I also made the following changes:

1.  Some API cleanup of GraphTran.  Connect will automatically add the child, so no need for
multiple calls.
2.  Fix a bug in HiveBaseReduceFunction.  HIVE-7652 made the iterator return false after close
if there's more rows, so Spark calls hasNext again and close thus gets called twice.  CommonJoinOperator
throws exception if close gets called more than once.  So adding a check there. 


Diffs (updated)
-----

  itests/src/test/resources/testconfiguration.properties 63af01d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java 03f0ff8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java 6568a76

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java d16f1be 
  ql/src/test/results/clientpositive/spark/join0.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/join1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/join_casesensitive.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/24919/diff/


Testing
-------

Added three join tests to the TestSparkCliDriver suite.


Thanks,

Szehon Ho


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message