hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun" <chao....@cloudera.com>
Subject Review Request 28889: HIVE-8911 - Enable mapjoin hints [Spark Branch]
Date Wed, 10 Dec 2014 04:48:24 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28889/
-----------------------------------------------------------

Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8911
    https://issues.apache.org/jira/browse/HIVE-8911


Repository: hive-git


Description
-------

Basically the idea is to reuse as much code as possible, from MR.

The issue is that, in MR's MapJoinProcessor, after join op is converted to mapjoin op, all
the parents ReduceSinkOperators are removed. However, for our Spark branch, we need to preserve
those, because they serve as boundaries between BaseWorks, and SparkReduceSinkMapJoinProc
triggers upon them.

Initially I tried to move this part of logic to SparkMapJoinOptimizer, which happens at a
later stage. However, although this works, I'm worried it may have too much affect on the
smb join w/ hint, because we then have to move that part of logic to SparkMapJoinOptimizer
too. In general, I want to minimize the affect on code path.

This patch make changes on MapJoinProcessor. I created a separate method convertMapJoinForSpark,
which doesn't remove the 
ReduceSinkOperators, for small tables. Then, in the transform method it decides which method
to call based on the execution engine.

I also have to disable several tests related to smb join w/ hints. They can be activated once
HIVE-8640 is resolved.


Diffs
-----

  data/conf/spark/hive-site.xml 44eac86 
  itests/src/test/resources/testconfiguration.properties d6f8267 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 773c827 
  ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out f24ae73 
  ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 33e9e8b 
  ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out aaa0151 
  ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 9954b77 
  ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out ad8f0a5 
  ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out aa3e2b6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 44233f6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out c4702ef 
  ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 7c31e05 
  ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out a8e892e 
  ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 041ba12 
  ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 54c4be3 
  ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out da9fe1c 
  ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 5a5e3f6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 5ac3f4c 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out e4ff965 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out fce5566 
  ql/src/test/results/clientpositive/spark/join25.q.out 284c97d 
  ql/src/test/results/clientpositive/spark/join26.q.out e271184 
  ql/src/test/results/clientpositive/spark/join27.q.out d31f29e 
  ql/src/test/results/clientpositive/spark/join30.q.out 7fbbcfa 
  ql/src/test/results/clientpositive/spark/join36.q.out f1317ea 
  ql/src/test/results/clientpositive/spark/join37.q.out 448e983 
  ql/src/test/results/clientpositive/spark/join38.q.out 735d7ea 
  ql/src/test/results/clientpositive/spark/join39.q.out 0734d4b 
  ql/src/test/results/clientpositive/spark/join40.q.out 60ef13d 
  ql/src/test/results/clientpositive/spark/join_map_ppr.q.out 59fdb99 
  ql/src/test/results/clientpositive/spark/mapjoin1.q.out 80e38b9 
  ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out dc7241c 
  ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out 3b80437 
  ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out fdf8f24 
  ql/src/test/results/clientpositive/spark/semijoin.q.out 2b8e04b 
  ql/src/test/results/clientpositive/spark/skewjoin.q.out 56b78be 

Diff: https://reviews.apache.org/r/28889/diff/


Testing
-------

bucket_map_join_1.q
bucket_map_join_2.q
bucketmapjoin1.q
bucketmapjoin10.q
bucketmapjoin11.q
bucketmapjoin12.q
bucketmapjoin13.q
bucketmapjoin2.q
bucketmapjoin3.q
bucketmapjoin4.q
bucketmapjoin5.q
bucketmapjoin7.q
bucketmapjoin8.q
bucketmapjoin9.q
bucketmapjoin_negative.q
bucketmapjoin_negative2.q
column_access_stats.q
join25.q
join26.q
join27.q
join30.q
join36.q
join37.q
join38.q
join39.q
join40.q
join_empty.q
join_filters_overlap.q
join_map_ppr.q
mapjoin1.q
mapjoin_distinct.q
mapjoin_filter_onerjoin.q
mapjoin_hook.q
mapjoin_tester.q
semijoin.q
skewjoin.q
table_access_keys_stats.q


Thanks,

Chao Sun


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message