spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Palaniappan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-21960) Spark Streaming Dynamic Allocation should respect spark.executor.instances
Date Fri, 08 Sep 2017 22:56:00 GMT
Karthik Palaniappan created SPARK-21960:
-------------------------------------------

             Summary: Spark Streaming Dynamic Allocation should respect spark.executor.instances
                 Key: SPARK-21960
                 URL: https://issues.apache.org/jira/browse/SPARK-21960
             Project: Spark
          Issue Type: Improvement
          Components: DStreams
    Affects Versions: 2.2.0
            Reporter: Karthik Palaniappan
            Priority: Minor


This check enforces that spark.executor.instances (aka --num-executors) is either unset or
explicitly set to 0. https://github.com/apache/spark/blob/v2.2.0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L207

If spark.executor.instances is unset, the check is fine, and the property defaults to 2. Spark
requests the cluster manager for 2 executors to start with, then adds/removes executors appropriately.

However, if you explicitly set it to 0, the check also succeeds, but Spark never asks the
cluster manager for any executors. When running on YARN, I repeatedly saw:

{code:java}
17/08/22 19:35:21 WARN org.apache.spark.scheduler.cluster.YarnScheduler: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are registered and have
sufficient resources
17/08/22 19:35:36 WARN org.apache.spark.scheduler.cluster.YarnScheduler: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are registered and have
sufficient resources
17/08/22 19:35:51 WARN org.apache.spark.scheduler.cluster.YarnScheduler: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are registered and have
sufficient resources
{code}

I noticed that at least Google Dataproc and Ambari explicitly set spark.executor.instances
to a positive number, meaning that to use dynamic allocation, you would have to edit spark-defaults.conf
to remove the property. That's obnoxious.

In addition, in Spark 2.3, spark-submit will refuse to accept "0" as a value for --num-executors
or --conf spark.executor.instances: https://github.com/apache/spark/commit/0fd84b05dc9ac3de240791e2d4200d8bdffbb01a#diff-63a5d817d2d45ae24de577f6a1bd80f9

It is much more reasonable for Streaming DRA to use spark.executor.instances, just like Core
DRA. I'll open a pull request to remove the check if there are no objections.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message