hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1158) Introducing a new parameter for Map-side join bucket size
Date Tue, 16 Feb 2010 07:28:27 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao updated HIVE-1158:
-----------------------------

      Resolution: Fixed
    Release Note: HIVE-1158. Introducing a new parameter for Map-side join bucket size. (Ning
Zhang via zshao)
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed. Thanks Ning!

> Introducing a new parameter for Map-side join bucket size
> ---------------------------------------------------------
>
>                 Key: HIVE-1158
>                 URL: https://issues.apache.org/jira/browse/HIVE-1158
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1158.patch
>
>
> Map-side join cache the small table in memory and join with the split of the large table
at the mapper side. If the small table is too large, it uses RowContainer to cache a number
of rows indicated by parameter hive.join.cache.size, whose default value is 25000. This parameter
is also used for regular reducer-side joins to cache all input tables except the streaming
table. This default value is too large for map-side join bucket size, resulting in OOM exceptions
sometimes. We should define a different parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message