hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho" <sze...@cloudera.com>
Subject Review Request 30443: HIVE-9192 : One-pass SMB Optimizations [Spark Branch]
Date Fri, 30 Jan 2015 03:27:43 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30443/
-----------------------------------------------------------

Review request for hive and Xuefu Zhang.


Repository: hive-git


Description
-------

This patch refactors SMB MapJoin optimizations in Spark to be one-pass.  The main part of
SMB MapJoin optimization is to annotate the MapWork with the information from SMBMapJoinOperator
and its roots (TableScans).

Instead of doing MapWork init/annotation in the SparkSortMergeJoinFactory in a second pass,
now both GenSparkWork and SparkSortMergeJoinFactory classes collect information.  After the
one-pass, we go through all the SMBJoinOperators and annotate their mapworks.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java 6e0ac38

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 773cfbd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 0eac6e1 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java cb5d4fe 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 3a7477a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkSMBMapJoinInfo.java PRE-CREATION


Diff: https://reviews.apache.org/r/30443/diff/


Testing
-------


Thanks,

Szehon Ho


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message