hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
Date Tue, 16 Dec 2014 04:33:13 GMT


Xuefu Zhang commented on HIVE-8843:

[~jxiang], thanks for working on this. The change made here seems a little more complicated
and pervasive than I thought. A SparkPlan object has all the references to the RDDs including
those being cached. Thus, once the plan is executed, these cached RDDs can be released by
accessing SparkPlan object. Thus, the changes will most likely be made in RemoteHiveSparkClient
and LocalHiveSparkClient.

> Release RDD cache when Hive query is done [Spark Branch]
> --------------------------------------------------------
>                 Key: HIVE-8843
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Jimmy Xiang
>         Attachments: HIVE-8843.1-spark.patch
> In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext
specific, but the caching is useful only for the query. Thus, once the query is executed,
we need to release the cache used by calling RDD.uncache().

This message was sent by Atlassian JIRA

View raw message