hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho" <sze...@cloudera.com>
Subject Review Request 29281: HIVE-8640 : Support hints of SMBJoin [Spark Branch]
Date Sat, 20 Dec 2014 00:01:56 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29281/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-8640
    https://issues.apache.org/jira/browse/HIVE-8640


Repository: hive-git


Description
-------

This change is on the same principle as the refactoring of HIVE-8639.  The goal is to move
as much of the join optimization as possible to the same traversal, and in fact the same process(joinOp)
method, to simplify the logic and also for compiler performance.

Whereas it is too hard to bring SparkMapJoinProcessor (for mapjoin hints) into the same level
due to the way it was written (see HIVE-8911), it is possible to bring Bucket join and SMB
join hints to the same level.  This change introduces a parallel processor called 'SparkJoinHintOptimizer',
which takes a mapjoin already converted by SparkMapJoinProcessor as input and converts it
to Bucket/SMB join accordingly.  It runs alongside 'SparkJoinOptimizer' which takes a common
join operator and handles the auto-conversion to mapjoin/bucketJoin/SMBJoin.

The one difference between mapjoin/bucketJoin vs SMB as Chao found was that while Spark mapjoins
expect RS for small-table branches in mapjoin/bucketJoin, this is not expected for SMB join.
 So I added a class SparkSMBHintJoinOptimizer that first removes this before re-using the
rest of the existing code.

Another issue was found in NonBlockingOpDeDupProc that corrupts 'mapJoinContext' data structure
in the parse context.  A fix is offered in HIVE-9117 and that should be committed to trunk
and merged first, but it is included here for reference.


Diffs
-----

  itests/src/test/resources/testconfiguration.properties fd732c1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java 5e0959a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinHintOptimizer.java PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSMBJoinHintOptimizer.java PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinOptimizer.java 6a47513

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 5227d92 
  ql/src/test/results/clientpositive/spark/smb_mapjoin9.q.out d769ebe 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_1.q.out 8d0527e 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_10.q.out 2df87cf 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_12.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out 5637206 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_14.q.out 3aed084 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out 6ed680d 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out a4fd7c3 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 6293450 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_2.q.out 1cf144b 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_3.q.out 6b44d2c 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_4.q.out d07d65a 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_5.q.out 607b1f0 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_6.q.out 30746ff 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out c48ed6d 

Diff: https://reviews.apache.org/r/29281/diff/


Testing
-------

Re-enabled all the smb_mapjoin.* tests.

I saw that a lot of the tests are again not alphabetized, so re-ran the script to alphabeticize
them.  As part of that, realized that some tests like 'bucket_map_join_spark.*' and 'join_empty'
were missing proper comma deliminters from the next test and probably not ran.  Also fixed
the windowing.q which is the last test.  This is all unrelated, but I am not sure if they
will trigger additional test failures if these were unintentionally disabled.


Thanks,

Szehon Ho


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message