hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho" <>
Subject Review Request 28500: HIVE-8943 : Fix memory limit check for combine nested mapjoins [Spark Branch]
Date Thu, 27 Nov 2014 06:13:20 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for hive, Chao Sun, Suhas Satish, and Xuefu Zhang.

Bugs: HIVE-8943

Repository: hive-git


SparkMapJoinOptimizer by default combines nested mapjoins into one work due to removal of
RS for big-table. So we need to enhance the mapjoin check to calculate if all the MapJoins
in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for
that particular spark executor.


  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/ 819eef1

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ 0c339a5

  ql/src/test/queries/clientpositive/auto_join_stats.q PRE-CREATION 
  ql/src/test/queries/clientpositive/auto_join_stats2.q PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join_stats2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_join_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_join_stats2.q.out PRE-CREATION 



Added two unit tests:

1.  auto_join_stats, which sets a memory limit and checks that algorithm does not put more
than 1 mapjoin in one BaseWork
2.  auto_join_stats2, which is the same query without memory limit, and check that algorithm
puts all mapjoin in one BaseWork because it can.


Szehon Ho

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message