spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "prakhar jauhari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.
Date Mon, 19 Oct 2015 07:30:05 GMT

    [ https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962961#comment-14962961
] 

prakhar jauhari commented on SPARK-11181:
-----------------------------------------

On analysing the code (Spark 1.3.1): 
When my DN goes unreachable: 

Spark core's HeartbeatReceiver invokes _expireDeadHosts()_: which checks if Dynamic Allocation
is supported and then invokes _"sc.killExecutor()"_
{quote}
		if (sc.supportDynamicAllocation) \{
          	    sc.killExecutor(executorId)
                }
{quote} 

Surprisingly _supportDynamicAllocation_ in _sparkContext.scala_ is defined to result "True"
if _dynamicAllocationTesting_ flag is enabled or spark is running over _yarn_  
{quote}
		private\[spark\] def supportDynamicAllocation = 
    		    master.contains("yarn") || dynamicAllocationTesting
{quote}	

_"sc.killExecutor()"_ matches it to configured _"schedulerBackend"_ (CoarseGrainedSchedulerBackend
in this case) and invokes _"killExecutors(executorIds)"_

CoarseGrainedSchedulerBackend calculates a _"newTotal"_ for the total number of executors
required, and sends a update to application master by invoking _"doRequestTotalExecutors(newTotal)"_
 
CoarseGrainedSchedulerBackend then invokes a _"doKillExecutors(filteredExecutorIds)"_ for
the lost executors. 

Thus reducing the total number of executors in a host intermittently unreachable scenario.


> Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-11181
>                 URL: https://issues.apache.org/jira/browse/SPARK-11181
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, Spark Core, YARN
>    Affects Versions: 1.3.1
>         Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. 
> All servers in cluster running Linux version 2.6.32. 
> Job in yarn-client mode.
>            Reporter: prakhar jauhari
>             Fix For: 1.3.2
>
>
> Spark driver reduces total executors count even when Dynamic Allocation is not enabled.
> To reproduce this:
> 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores.
> 2. When the application launches and gets it required executors, One of the DN's losses
connectivity and is timed out.
> 3. Spark issues a killExecutor for the executor on the DN which was timed out. 
> 4. Even with dynamic allocation off, spark's scheduler reduces the "targetNumExecutors".
> 5. Thus the job runs with reduced executor count.
> Note : The severity of the issue increases : If some of the DN that were running my job's
executors lose connectivity intermittently, spark scheduler reduces "targetNumExecutors",
thus not asking for new executors on any other nodes, causing the job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message