hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Soundararajan Velu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys
Date Fri, 04 Jun 2010 13:30:57 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875595#action_12875595
] 

Soundararajan Velu commented on HIVE-1139:
------------------------------------------

Thanks Ning, we are trying to implement HashMapWrapper in our solution for Group By,  Wanted
to know if this has been already done, It will be great if you can suggest on this.

> GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-1139
>                 URL: https://issues.apache.org/jira/browse/HIVE-1139
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Arvind Prabhakar
>
> When a partial aggregation performed on a mapper, a HashMap is created to keep all distinct
keys in main memory. This could leads to OOM exception when there are too many distinct keys
for a particular mapper. A workaround is to set the map split size smaller so that each mapper
takes less number of rows. A better solution is to use the persistent HashMapWrapper (currently
used in CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message