spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Frederick Reiss (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21084) Improvements to dynamic allocation for notebook use cases
Date Tue, 13 Jun 2017 22:23:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Frederick Reiss updated SPARK-21084:
------------------------------------
    Description: 
One important application of Spark is to support many notebook users with a single YARN or
Spark Standalone cluster.  We at IBM have seen this requirement across multiple deployments
of Spark: on-premises and private cloud deployments at our clients, as well as on the IBM
cloud.  The scenario goes something like this: "Every morning at 9am, 500 analysts log into
their computers and start running Spark notebooks intermittently for the next 8 hours." I'm
sure that many other members of the community are interested in making similar scenarios work.
    
Dynamic allocation is supposed to support these kinds of use cases by shifting cluster resources
towards users who are currently executing scalable code.  In our own testing, we have encountered
a number of issues with using the current implementation of dynamic allocation for this purpose:
*Issue #1: Starvation.* A Spark job acquires all available containers, preventing other jobs
or applications from starting.
*Issue #2: Request latency.* Jobs that would normally finish in less than 30 seconds take
2-4x longer than normal with dynamic allocation.
*Issue #3: Unfair resource allocation due to cached data.* Applications that have cached RDD
partitions hold onto executors indefinitely, denying those resources to other applications.
*Issue #4: Loss of cached data leads to thrashing.*  Applications repeatedly lose partitions
of cached RDDs because the underlying executors are removed; the applications then need to
rerun expensive computations.
    
This umbrella JIRA covers efforts to address these issues by making enhancements to Spark.


  was:
One important application of Spark is to support many notebook users with a single YARN or
Spark Standalone cluster.  We at IBM have seen this requirement across multiple deployments
of Spark: on-premises and private cloud deployments at our clients, as well as on the IBM
cloud.  The scenario goes something like this: "Every morning at 9am, 500 analysts log into
their computers and start running Spark notebooks intermittently for the next 8 hours." I'm
sure that many other members of the community are interested in making similar scenarios work.
    
Dynamic allocation is supposed to support these kinds of use cases by shifting cluster resources
towards users who are currently executing scalable code.  In our own testing, we have encountered
a number of issues with using the current implementation of dynamic allocation for this purpose:
*Issue #1: Starvation.* A Spark job acquires all available YARN containers, preventing other
jobs or applications from starting.
*Issue #2: Request latency.* Jobs that would normally finish in less than 30 seconds take
2-4x longer than normal with dynamic allocation.
*Issue #3: Unfair resource allocation due to cached data.* Applications that have cached RDD
partitions hold onto executors indefinitely, denying those resources to other applications.
*Issue #4: Loss of cached data leads to thrashing.*  Applications repeatedly lose partitions
of cached RDDs because the underlying executors are removed; the applications then need to
rerun expensive computations.
    
This umbrella JIRA covers efforts to address these issues by making enhancements to Spark.



> Improvements to dynamic allocation for notebook use cases
> ---------------------------------------------------------
>
>                 Key: SPARK-21084
>                 URL: https://issues.apache.org/jira/browse/SPARK-21084
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Frederick Reiss
>
> One important application of Spark is to support many notebook users with a single YARN
or Spark Standalone cluster.  We at IBM have seen this requirement across multiple deployments
of Spark: on-premises and private cloud deployments at our clients, as well as on the IBM
cloud.  The scenario goes something like this: "Every morning at 9am, 500 analysts log into
their computers and start running Spark notebooks intermittently for the next 8 hours." I'm
sure that many other members of the community are interested in making similar scenarios work.
>     
> Dynamic allocation is supposed to support these kinds of use cases by shifting cluster
resources towards users who are currently executing scalable code.  In our own testing, we
have encountered a number of issues with using the current implementation of dynamic allocation
for this purpose:
> *Issue #1: Starvation.* A Spark job acquires all available containers, preventing other
jobs or applications from starting.
> *Issue #2: Request latency.* Jobs that would normally finish in less than 30 seconds
take 2-4x longer than normal with dynamic allocation.
> *Issue #3: Unfair resource allocation due to cached data.* Applications that have cached
RDD partitions hold onto executors indefinitely, denying those resources to other applications.
> *Issue #4: Loss of cached data leads to thrashing.*  Applications repeatedly lose partitions
of cached RDDs because the underlying executors are removed; the applications then need to
rerun expensive computations.
>     
> This umbrella JIRA covers efforts to address these issues by making enhancements to Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message