spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang-Chi Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-18487) Consume all elements for Dataset.show/take to avoid memory leak
Date Fri, 18 Nov 2016 02:11:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-18487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liang-Chi Hsieh updated SPARK-18487:
------------------------------------
    Description: 
The methods such as Dataset.show and take use Limit (CollectLimitExec) which leverages SparkPlan.executeTake
to efficiently collect required number of elements back to the driver.

However, under wholestage codege, we usually release resources after all elements are consumed
(e.g., HashAggregate). In this case, we will not release the resources and cause memory leak
with Dataset.show, for example.

We can add task completion listener to HashAggregate to avoid the memory leak.


  was:
The methods such as Dataset.show and take use Limit (CollectLimitExec) which leverages SparkPlan.executeTake
to efficiently collect required number of elements back to the driver.

However, under wholestage codege, we usually release resources after all elements are consumed
(e.g., HashAggregate). In this case, we will not release the resources and cause memory leak
with Dataset.show, for example.

We should consume all elements in the iterator to avoid memory leak.



> Consume all elements for Dataset.show/take to avoid memory leak
> ---------------------------------------------------------------
>
>                 Key: SPARK-18487
>                 URL: https://issues.apache.org/jira/browse/SPARK-18487
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Liang-Chi Hsieh
>
> The methods such as Dataset.show and take use Limit (CollectLimitExec) which leverages
SparkPlan.executeTake to efficiently collect required number of elements back to the driver.
> However, under wholestage codege, we usually release resources after all elements are
consumed (e.g., HashAggregate). In this case, we will not release the resources and cause
memory leak with Dataset.show, for example.
> We can add task completion listener to HashAggregate to avoid the memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message