From Gopal Vijayaraghavan <>
Subject Re: Hash table in map join - Hive
Date Tue, 28 Jun 2016 02:25:26 GMT

> 1. OOM condition -- I get the following error when I force a map join in
>hive/tez with low container size and heap size:"
>java.lang.OutOfMemoryError: Java heap space". I was wondering what is the
>condition which leads to this error.

You are not modifying the noconditionaltasksize to match the Xmx at all. - io.sort.mb)/3.0;

> 2.  Shuffle Hash Join -- I am using hive 2.0.1. What is the way to force
>this join implementation? Is there any documentation regarding the same?


For full-fledged speed-mode, do

set hive.vectorized.execution.reduce.enabled=true;
set hive.optimize.dynamic.partition.hashjoin=true;
set hive.mapjoin.hybridgrace.hashtable=false;

> 3. Hash table size: I use "--hiveconf hive.root.logger=INFO,console" for
>seeing logs. I do not see the hash table size in these logs.

No, the hashtables are no longer built on the gateway nodes - that used to
be a single point of failure when 20-25 usere are connected via the same

The hashtable logs are in the task side (in this case, I would guess Map
2's logs would have it). The output is from a log like which looks like

yarn logs -applicationId <app-id> | grep Map.*metrics

> Map 1                      3                0            0
>37.11             65,710              1,039     15,000,000

So you have 15 million keys going into a single hashtable? The broadcast
output rows is fed into the hashtable on the other side.

The map-join sort of runs out of steam after about ~4 million entries - I
would guess for your scenario setting the noconditional size to 8388608
(~8Mb) might trigger the good path.


