hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Hash table in map join - Hive
Date Fri, 15 Jul 2016 23:22:54 GMT

> When is OOM error actually thrown? With
>hive.mapjoin.hybridgrace.hashtable set to true, spilling should be
>possible, so OOM error should not come.
...
> Is it the case when the hash table of not even one of the 16 partitions
>fits in memory? 

It will OOM if any one of them overflows.

The grace hash-join works really well when the memory side is a primary
key and doesn't have these sort of skews.


The real problem with the shuffle hash-join is that it amplifies the
skews, since it is doing distributions on the hashcode of the join keys.

> But increasing the partitions to 100 also did not solve the problem
>(This is in the case of 3G container size and 5G small table size.
> I have given a high value for
>hive.auto.convert.join.noconditionaltask.size so that the broadcast hash
>join path is picked.

Unfortunately, it will only start spilling after it overflows
"hive.auto.convert.join.noconditionaltask.size" or 55% of the heap.

set hive.mapjoin.followby.gby.localtask.max.memory.usage=0.1;


    // Get the total available memory from memory manager
    long totalMapJoinMemory = desc.getMemoryNeeded();
    LOG.info("Memory manager allocates " + totalMapJoinMemory + " bytes
for the loading hashtable.");
    if (totalMapJoinMemory <= 0) {
      totalMapJoinMemory = HiveConf.getLongVar(
        hconf, 
HiveConf.ConfVars.HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD);
    }

    long processMaxMemory =
ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax();
    if (totalMapJoinMemory > processMaxMemory) {
      float hashtableMemoryUsage = HiveConf.getFloatVar(
          hconf, HiveConf.ConfVars.HIVEHASHTABLEFOLLOWBYGBYMAXMEMORYUSAGE);
      LOG.warn("totalMapJoinMemory value of " + totalMapJoinMemory +
          " is greater than the max memory size of " + processMaxMemory);
      // Don't want to attempt to grab more memory than we have available
.. percentage is a bit arbitrary
      totalMapJoinMemory = (long) (processMaxMemory *
hashtableMemoryUsage);
    }


If you can devise an example using TPC-DS schema on hive-testbench, I can
run it locally and tell you what's happening.


Cheers,
Gopal



Mime
View raw message