spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-25947) Reduce memory usage in ShuffleExchangeExec by selecting only the sort columns
Date Wed, 07 Nov 2018 01:48:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-25947:
------------------------------------

    Assignee:     (was: Apache Spark)

> Reduce memory usage in ShuffleExchangeExec by selecting only the sort columns
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-25947
>                 URL: https://issues.apache.org/jira/browse/SPARK-25947
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: Shuheng Dai
>            Priority: Major
>
> When sorting rows, ShuffleExchangeExec uses the entire row instead of just the columns
references in SortOrder to create the RangePartitioner. This causes the RangePartitioner to
sample entire rows to create rangeBounds and can cause OOM issues on the driver when rows
contain large fields.
> Create a projection and only use columns involved in the SortOrder for the RangePartitioner



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message