spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Or (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application
Date Mon, 13 Oct 2014 19:46:33 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169830#comment-14169830
] 

Andrew Or edited comment on SPARK-3174 at 10/13/14 7:45 PM:
------------------------------------------------------------

[~vanzin]

bq. Are you proposing a change to the current semantics, where Yarn will request "--num-executors"
up front? If you keep that, I think that would cover my above concerns. But switching to a
slow start with no option to pre-allocate a certain numbers seems like it might harm certain
jobs.

I'm actually not proposing to change the application start-up behavior. Spark will continue
to request however many number of executors it will today upfront. The slow-start comes in
when you want to add executors after removing them. Also, you can control how often you want
to add executors with a config, so if the application wants the behavior where it requests
all executors at once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you investigated whether
it would be possible to make Hadoop's shuffle service more generic, so that Spark can benefit
from it? It does mean that this feature might be constrained to certain versions of Hadoop,
but maybe that's not necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there stems from
the hard-coded shuffle file paths and index shuffle file format. Currently, both are highly
specific to MR, and although we can work around them by adapting Spark's shuffle behavior
to MR's (non-trivial but certainly possible), we'll only be able to use the feature on Yarn.
If we decide to extend this feature to standalone or mesos mode, we'll have to do what we're
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.


was (Author: andrewor14):
[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how the app will
behave when it starts up. I'd expect the first job to be the one that has to process the largest
amount of data, so it would benefit from having as many executors as possible available as
quickly as possible - something that seems to conflict with the idea of a slow start.
bq. Are you proposing a change to the current semantics, where Yarn will request "--num-executors"
up front? If you keep that, I think that would cover my above concerns. But switching to a
slow start with no option to pre-allocate a certain numbers seems like it might harm certain
jobs.

I'm actually not proposing to change the application start-up behavior. Spark will continue
to request however many number of executors it will today upfront. The slow-start comes in
when you want to add executors after removing them. Also, you can control how often you want
to add executors with a config, so if the application wants the behavior where it requests
all executors at once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you investigated whether
it would be possible to make Hadoop's shuffle service more generic, so that Spark can benefit
from it? It does mean that this feature might be constrained to certain versions of Hadoop,
but maybe that's not necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there stems from
the hard-coded shuffle file paths and index shuffle file format. Currently, both are highly
specific to MR, and although we can work around them by adapting Spark's shuffle behavior
to MR's (non-trivial but certainly possible), we'll only be able to use the feature on Yarn.
If we decide to extend this feature to standalone or mesos mode, we'll have to do what we're
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.

> Provide elastic scaling within a Spark application
> --------------------------------------------------
>
>                 Key: SPARK-3174
>                 URL: https://issues.apache.org/jira/browse/SPARK-3174
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 1.0.2
>            Reporter: Sandy Ryza
>            Assignee: Andrew Or
>         Attachments: SPARK-3174design.pdf, dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that applications have
a fixed allocation that doesn't grow and shrink with their resource needs.  We're blocked
on YARN-1197 for dynamically changing the resources within executors, but we can still allocate
and discard whole executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message