hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-15580) Replace Spark's groupByKey operator with something with bounded memory
Date Wed, 18 Jan 2017 14:24:26 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828155#comment-15828155
] 

Xuefu Zhang edited comment on HIVE-15580 at 1/18/17 2:24 PM:
-------------------------------------------------------------

Hi [~lirui], your understanding is correct.

And yes, groupByKey uses unbounded memory. While Spark can split to disk for groupBy, but
the spilling has to be at the key/group boundary. In another word, one has to have enough
memory to hold any given key group  Thus, for a big key group, Spark can still run out of
memory.

Ref: http://apache-spark-user-list.1001560.n3.nabble.com/Understanding-RDD-GroupBy-OutOfMemory-Exceptions-td11427.html


was (Author: xuefuz):
Hi [~lirui], your understanding is correct.

And yes, groupByKey uses unbounded memory. While Spark can split to disk for this, but the
spilling has to be at the key/group boundary. For a big key group, Spark can still run out
of memory.

Ref: http://apache-spark-user-list.1001560.n3.nabble.com/Understanding-RDD-GroupBy-OutOfMemory-Exceptions-td11427.html

> Replace Spark's groupByKey operator with something with bounded memory
> ----------------------------------------------------------------------
>
>                 Key: HIVE-15580
>                 URL: https://issues.apache.org/jira/browse/HIVE-15580
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-15580.1.patch, HIVE-15580.1.patch, HIVE-15580.2.patch, HIVE-15580.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message