spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prakhar jauhari <>
Subject Spark driver reducing total executors count even when Dynamic Allocation is disabled.
Date Mon, 19 Oct 2015 07:51:21 GMT
Hey all,

Thanks in advance. I ran into a situation where spark driver reduced the
total executors count for my job even with dynamic allocation disabled, and
caused the job to hang for ever. 

Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. 
All servers in cluster running Linux version 2.6.32. 
Job in yarn-client mode.

1. Application running with required number of executors.
2. One of the DN's losses connectivity and is timed out.
2. Spark issues a killExecutor for the executor on the DN which was timed
3. Even with dynamic allocation off, spark's driver reduces the

On analysing the code (Spark 1.3.1): 

When my DN goes unreachable: 
Spark core's HeartbeatReceiver invokes expireDeadHosts(): which checks if
Dynamic Allocation is supported and then invokes "sc.killExecutor()"

	/if (sc.supportDynamicAllocation) {

Surprisingly supportDynamicAllocation in sparkContext.scala is defined as,
resulting "True" if dynamicAllocationTesting flag is enabled or spark is
running over "yarn".

/private[spark] def supportDynamicAllocation = 
    		    master.contains("yarn") || dynamicAllocationTesting	/

"sc.killExecutor()" matches it to configured "schedulerBackend"
(CoarseGrainedSchedulerBackend in this case) and invokes

CoarseGrainedSchedulerBackend calculates a "newTotal" for the total number
of executors required, and sends a update to application master by invoking
CoarseGrainedSchedulerBackend then invokes a
"doKillExecutors(filteredExecutorIds)" for the lost executors. 

Thus reducing the total number of executors in a host intermittently
unreachable scenario.

I noticed that this change to "CoarseGrainedSchedulerBackend" was introduced
while fixing :

I am new to this code, If any of you could comment on why do we need
"doRequestTotalExecutors" in "killExecutors" would be a great help. Also why
do we have "supportDynamicAllocation" = True even if i have not enabled
dynamic allocation. 


View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message