spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pwend...@apache.org
Subject git commit: SPARK-1097: Do not introduce deadlock while fixing concurrency bug
Date Wed, 16 Jul 2014 21:10:38 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-1.0 bf1ddc7b8 -> 91e7a71c6


SPARK-1097: Do not introduce deadlock while fixing concurrency bug

We recently added this lock on 'conf' in order to prevent concurrent creation. However, it
turns out that this can introduce a deadlock because Hadoop also synchronizes on the Configuration
objects when creating new Configurations (and they do so via a static REGISTRY which contains
all created Configurations).

This fix forces all Spark initialization of Configuration objects to occur serially by using
a static lock that we control, and thus also prevents introducing the deadlock.

Author: Aaron Davidson <aaron@databricks.com>

Closes #1409 from aarondav/1054 and squashes the following commits:

7d1b769 [Aaron Davidson] SPARK-1097: Do not introduce deadlock while fixing concurrency bug
(cherry picked from commit 8867cd0bc2961fefed84901b8b14e9676ae6ab18)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/91e7a71c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/91e7a71c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/91e7a71c

Branch: refs/heads/branch-1.0
Commit: 91e7a71c68eb9ff0738c21bc7525fa89bd662993
Parents: bf1ddc7
Author: Aaron Davidson <aaron@databricks.com>
Authored: Wed Jul 16 14:10:17 2014 -0700
Committer: Patrick Wendell <pwendell@gmail.com>
Committed: Wed Jul 16 14:10:33 2014 -0700

----------------------------------------------------------------------
 core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/91e7a71c/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
index a55b226..d0a2241 100644
--- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
@@ -139,8 +139,8 @@ class HadoopRDD[K, V](
       // Create a JobConf that will be cached and used across this RDD's getJobConf() calls
in the
       // local process. The local cache is accessed through HadoopRDD.putCachedMetadata().
       // The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects.
-      // synchronize to prevent ConcurrentModificationException (Spark-1097, Hadoop-10456)
-      conf.synchronized {
+      // Synchronize to prevent ConcurrentModificationException (Spark-1097, Hadoop-10456).
+      HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
         val newJobConf = new JobConf(conf)
         initLocalJobConfFuncOpt.map(f => f(newJobConf))
         HadoopRDD.putCachedMetadata(jobConfCacheKey, newJobConf)
@@ -231,6 +231,9 @@ class HadoopRDD[K, V](
 }
 
 private[spark] object HadoopRDD {
+  /** Constructing Configuration objects is not threadsafe, use this lock to serialize. */
+  val CONFIGURATION_INSTANTIATION_LOCK = new Object()
+
   /**
    * The three methods below are helpers for accessing the local map, a property of the SparkEnv
of
    * the local process.


Mime
View raw message