spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zsxw...@apache.org
Subject spark git commit: [SPARK-15262] Synchronize block manager / scheduler executor state
Date Wed, 11 May 2016 20:37:09 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6e08eb469 -> 2454f6abf


[SPARK-15262] Synchronize block manager / scheduler executor state

## What changes were proposed in this pull request?

If an executor is still alive even after the scheduler has removed its metadata, we may receive
a heartbeat from that executor and tell its block manager to reregister itself. If that happens,
the block manager master will know about the executor, but the scheduler will not.

That is a dangerous situation, because when the executor does get disconnected later, the
scheduler will not ask the block manager to also remove metadata for that executor. Later,
when we try to clean up an RDD or a broadcast variable, we may try to send a message to that
executor, triggering an exception.

## How was this patch tested?

Jenkins.

Author: Andrew Or <andrew@databricks.com>

Closes #13055 from andrewor14/block-manager-remove.

(cherry picked from commit 40a949aae9c3040019a52482d091912a85b0f4d4)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2454f6ab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2454f6ab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2454f6ab

Branch: refs/heads/branch-2.0
Commit: 2454f6abf29c938420dda8319a4e4afd758fc4e3
Parents: 6e08eb4
Author: Andrew Or <andrew@databricks.com>
Authored: Wed May 11 13:36:58 2016 -0700
Committer: Shixiong Zhu <shixiong@databricks.com>
Committed: Wed May 11 13:37:05 2016 -0700

----------------------------------------------------------------------
 .../scheduler/cluster/CoarseGrainedSchedulerBackend.scala   | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/2454f6ab/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
index 8896391..0fea9c1 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
@@ -289,7 +289,14 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val
rpcEnv: Rp
           scheduler.executorLost(executorId, if (killed) ExecutorKilled else reason)
           listenerBus.post(
             SparkListenerExecutorRemoved(System.currentTimeMillis(), executorId, reason.toString))
-        case None => logInfo(s"Asked to remove non-existent executor $executorId")
+        case None =>
+          // SPARK-15262: If an executor is still alive even after the scheduler has removed
+          // its metadata, we may receive a heartbeat from that executor and tell its block
+          // manager to reregister itself. If that happens, the block manager master will
know
+          // about the executor, but the scheduler will not. Therefore, we should remove
the
+          // executor from the block manager when we hit this case.
+          scheduler.sc.env.blockManager.master.removeExecutor(executorId)
+          logInfo(s"Asked to remove non-existent executor $executorId")
       }
     }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message