spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Erlandson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21097) Dynamic allocation will preserve cached data
Date Mon, 27 Aug 2018 21:57:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594271#comment-16594271
] 

Erik Erlandson commented on SPARK-21097:
----------------------------------------

I'm wondering if this is going to be subsumed by the Shuffle Service redesign proposal.

cc [~mcheah]

> Dynamic allocation will preserve cached data
> --------------------------------------------
>
>                 Key: SPARK-21097
>                 URL: https://issues.apache.org/jira/browse/SPARK-21097
>             Project: Spark
>          Issue Type: Improvement
>          Components: Block Manager, Scheduler, Spark Core
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Brad
>            Priority: Major
>         Attachments: Preserving Cached Data with Dynamic Allocation.pdf
>
>
> We want to use dynamic allocation to distribute resources among many notebook users on
our spark clusters. One difficulty is that if a user has cached data then we are either prevented
from de-allocating any of their executors, or we are forced to drop their cached data, which
can lead to a bad user experience.
> We propose adding a feature to preserve cached data by copying it to other executors
before de-allocation. This behavior would be enabled by a simple spark config. Now when an
executor reaches its configured idle timeout, instead of just killing it on the spot, we will
stop sending it new tasks, replicate all of its rdd blocks onto other executors, and then
kill it. If there is an issue while we replicate the data, like an error, it takes too long,
or there isn't enough space, then we will fall back to the original behavior and drop the
data and kill the executor.
> This feature should allow anyone with notebook users to use their cluster resources more
efficiently. Also since it will be completely opt-in it will unlikely to cause problems for
other use cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message