spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-19438) executorDataMap should be guarded by CoarseGrainedSchedulerBackend.this.synchronized
Date Thu, 02 Feb 2017 16:02:51 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-19438:
------------------------------------

    Assignee:     (was: Apache Spark)

> executorDataMap should be guarded by CoarseGrainedSchedulerBackend.this.synchronized

> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-19438
>                 URL: https://issues.apache.org/jira/browse/SPARK-19438
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>
> Currently when handle *RegisterExecutor* in *CoarseGrainedSchedulerBackend*, *executorDataMap*
is guarded by *CoarseGrainedSchedulerBackend.this.synchronized* when updating, which can cause
*numPendingExecutors* incorrect. 
> Code is like below:
> {code}
>         if (executorDataMap.contains(executorId)) {
>           executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
>           context.reply(true)
>         } else {
>           ...
>           CoarseGrainedSchedulerBackend.this.synchronized {
>             executorDataMap.put(executorId, data)
>             if (currentExecutorIdCounter < executorId.toInt) {
>               currentExecutorIdCounter = executorId.toInt
>             }
>             if (numPendingExecutors > 0) {
>               numPendingExecutors -= 1
>               logDebug(s"Decremented number of pending executors ($numPendingExecutors
left)")
>             }
>           }
> {code}
> Consider SPARK-19437 and a scenario like below:
> An executor sent *RegisterExecutor* twice by *askWithRetry*, and the interval between
the two is quite small. Thus it might be possible that both of them will go to *else* branch,
thus *numPendingExecutors* will be deducted twice. Currently, the *askWithRetry* of *RegisterExecutor*
only exists in some unit tests, but it makes sense to make it stronger when handling *RegisterExecutor*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message