hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Guth <>
Subject Re: Hash table in map join - Hive
Date Wed, 29 Jun 2016 18:53:58 GMT
Hi Gopal,

I saw the log files and the hash table information in it. Thanks.

Also, I enforced shuffle hash join. I had a couple of questions around it:

1. In the query plan, it still says Map Join Operator (Would have expected
it to be named as Reduce side operator).
2. The edges in this query plans were named as custom_simple_edge: Is this
the one pointing to the fact that sorting of mapper inputs are bypassed?
3. "" is not taking effect for
shuffle hash join. With the same input tables, in merge join (Shuffle sort
merge join), it took 1009 reducers without auto reducer turned on and took
337 reducers in the other case. While in case of shuffle hash join, it is
not changing from 1009 to 337. Is there something else I need to do, for
getting this optimization feature on, in this case?

I had a few general questions too:
1. What does do -- Does it only reduce the
number of reducers based on the actual size of mapper output, or does it do
more. Because as mentioned above, in sort merge join case, if I try to
manually set the number of reduce tasks to 337 (using mapred.reduce.tasks
parameter), the execution time does not improve as much as when
param picks it by itself.

 2. I did not understand the intuition behind setting
hive.mapjoin.hybridgrace.hashtable=false (as mentioned in your previous
reply). Does hybrid grace hashtable mean the Hybrid Hybrid grace Hash join
implementation as mentioned here
If it is set to true, the hash table is created with multiple partitions.
If it is set to false, is the hash table created as a single hash table?
Isn't the true case better, as it can handle the case where the hash join
cannot fit in memory better. Also, there will be smaller lookups. I ran
both the cases (with gracehashtable set to true and false), and did not see
any difference in execution time  -- maybe because my input size was
considerably small in that case.

3. In general, map join  in cluster mode, are these the actual steps
followed in hive/tez:
 a. *Hash table generation: * Partitioned hash tables of the small table is
created across multiple containers. In each container, a part of the small
table is dealt with. And in each container, the hash table is built for
that part, in 16 partitions. If any partition cannot fit in memory, it is
spilled to disk (with only disk file and not match file, since there is no
matching with big table happening).
b. *Broadcast of hash table*: All the partitions of all the parts of mall
table, including the ones spilled in the disk are serialized and sent to
all the second map containers.
c. *Join operator*:  The big table is scanned in each second mapper,
against the entire hash table of small table, and result is got.
Where does the rebuilding of spilt hash table happen? Is it during second
map phase where join is happening with bigger table?

Apologies for the long list of questions. But knowing this would be very
helpful to me.

Thanks in advance,

On Mon, Jun 27, 2016 at 7:25 PM, Gopal Vijayaraghavan <>

> > 1. OOM condition -- I get the following error when I force a map join in
> >hive/tez with low container size and heap size:"
> >java.lang.OutOfMemoryError: Java heap space". I was wondering what is the
> >condition which leads to this error.
> You are not modifying the noconditionaltasksize to match the Xmx at all.
> - io.sort.mb)/3.0;
> > 2.  Shuffle Hash Join -- I am using hive 2.0.1. What is the way to force
> >this join implementation? Is there any documentation regarding the same?
> <>
> For full-fledged speed-mode, do
> set hive.vectorized.execution.reduce.enabled=true;
> set hive.optimize.dynamic.partition.hashjoin=true;
> set;
> set hive.mapjoin.hybridgrace.hashtable=false;
> > 3. Hash table size: I use "--hiveconf hive.root.logger=INFO,console" for
> >seeing logs. I do not see the hash table size in these logs.
> No, the hashtables are no longer built on the gateway nodes - that used to
> be a single point of failure when 20-25 usere are connected via the same
> box.
> The hashtable logs are in the task side (in this case, I would guess Map
> 2's logs would have it). The output is from a log like which looks like
> yarn logs -applicationId <app-id> | grep Map.*metrics
> > Map 1                      3                0            0
> >37.11             65,710              1,039     15,000,000
> >15,000,000
> So you have 15 million keys going into a single hashtable? The broadcast
> output rows is fed into the hashtable on the other side.
> The map-join sort of runs out of steam after about ~4 million entries - I
> would guess for your scenario setting the noconditional size to 8388608
> (~8Mb) might trigger the good path.
> Cheers,
> Gopal

View raw message