hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
Date Mon, 06 Dec 2010 07:10:11 GMT


Namit Jain commented on HIVE-1830:

After HIVE-1642, joins are automatically converted into map-joins at physical optimization

However, this may lead to problems.

For eg:  consider the query:

select T1.val, count(1) from T1 join T2 on T1.key=T2.key group by T1.val

This will have 2 map-reduce jobs, one for the join and the other for group by.

Before HIVE-1642, the partial group for aggregation will be performed in the reducer where
the join is performed.
However, after HIVE-1642, the same will be performed in the mapper. The local task will confirm
that there is  just
enough memory to hold the map-join data. Hoever, it does not take into account the memory
needed for partial group

So, in case there is group by followed by join, it is a good idea to reduce the memory given
to the local task to validate
if there is enough memory to fit small table - it can be controlled by a new configuration
paramter, but it can be some
default: say 70% of total memory (instead of 90%).

Also, the group by may still run out of memory, so it might be a good idea to check in group
by for free memory and
periodically flush memory

> mappers in group followed by joins may die OOM
> ----------------------------------------------
>                 Key: HIVE-1830
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Liyin Tang

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message