hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: Hash table in map join - Hive
Date Fri, 15 Jul 2016 23:22:54 GMT

> When is OOM error actually thrown? With
>hive.mapjoin.hybridgrace.hashtable set to true, spilling should be
>possible, so OOM error should not come.
> Is it the case when the hash table of not even one of the 16 partitions
>fits in memory? 

It will OOM if any one of them overflows.

The grace hash-join works really well when the memory side is a primary
key and doesn't have these sort of skews.

The real problem with the shuffle hash-join is that it amplifies the
skews, since it is doing distributions on the hashcode of the join keys.

> But increasing the partitions to 100 also did not solve the problem
>(This is in the case of 3G container size and 5G small table size.
> I have given a high value for
> so that the broadcast hash
>join path is picked.

Unfortunately, it will only start spilling after it overflows
"" or 55% of the heap.

set hive.mapjoin.followby.gby.localtask.max.memory.usage=0.1;

    // Get the total available memory from memory manager
    long totalMapJoinMemory = desc.getMemoryNeeded();"Memory manager allocates " + totalMapJoinMemory + " bytes
for the loading hashtable.");
    if (totalMapJoinMemory <= 0) {
      totalMapJoinMemory = HiveConf.getLongVar(

    long processMaxMemory =
    if (totalMapJoinMemory > processMaxMemory) {
      float hashtableMemoryUsage = HiveConf.getFloatVar(
      LOG.warn("totalMapJoinMemory value of " + totalMapJoinMemory +
          " is greater than the max memory size of " + processMaxMemory);
      // Don't want to attempt to grab more memory than we have available
.. percentage is a bit arbitrary
      totalMapJoinMemory = (long) (processMaxMemory *

If you can devise an example using TPC-DS schema on hive-testbench, I can
run it locally and tell you what's happening.


View raw message