spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Armbrust (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-8890) Reduce memory consumption for dynamic partition insert
Date Fri, 07 Aug 2015 23:25:45 GMT

     [ https://issues.apache.org/jira/browse/SPARK-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Armbrust resolved SPARK-8890.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0

Issue resolved by pull request 8010
[https://github.com/apache/spark/pull/8010]

> Reduce memory consumption for dynamic partition insert
> ------------------------------------------------------
>
>                 Key: SPARK-8890
>                 URL: https://issues.apache.org/jira/browse/SPARK-8890
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Michael Armbrust
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> Currently, InsertIntoHadoopFsRelation can run out of memory if the number of table partitions
is large. The problem is that we open one output writer for each partition, and when data
are randomized and when the number of partitions is large, we open a large number of output
writers, leading to OOM.
> The solution here is to inject a sorting operation once the number of active partitions
is beyond a certain point (e.g. 50?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message