spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleksii Kostyliev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-7233) ClosureCleaner#clean blocks concurrent job submitter threads
Date Wed, 29 Apr 2015 11:39:07 GMT

    [ https://issues.apache.org/jira/browse/SPARK-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519169#comment-14519169
] 

Oleksii Kostyliev commented on SPARK-7233:
------------------------------------------

To illustrate the issue, I performed a test against local Spark.
Attached is the screenshot from the Threads view in Yourkit profiler.
The test was generating only 20 concurrent requests.
As you can see, job submitter threads mainly spend their time being blocked by each other.

> ClosureCleaner#clean blocks concurrent job submitter threads
> ------------------------------------------------------------
>
>                 Key: SPARK-7233
>                 URL: https://issues.apache.org/jira/browse/SPARK-7233
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.1, 1.4.0
>            Reporter: Oleksii Kostyliev
>         Attachments: blocked_threads_closurecleaner.png
>
>
> {{org.apache.spark.util.ClosureCleaner#clean}} method contains logic to determine if
Spark is run in interpreter mode: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L120
> While this behavior is indeed valuable in particular situations, in addition to this
it causes concurrent submitter threads to be blocked on a native call to {{java.lang.Class#forName0}}
since it appears only 1 thread at a time can make the call.
> This becomes a major issue when you have multiple threads concurrently submitting short-lived
jobs. This is one of the patterns how we use Spark in production, and the number of parallel
requests is expected to be quite high, up to a couple of thousand at a time.
> A typical stacktrace of a blocked thread looks like:
> {code}
> http-bio-8091-exec-14 [BLOCKED] [DAEMON]
> java.lang.Class.forName0(String, boolean, ClassLoader, Class) Class.java (native)
> java.lang.Class.forName(String) Class.java:260
> org.apache.spark.util.ClosureCleaner$.clean(Object, boolean) ClosureCleaner.scala:122
> org.apache.spark.SparkContext.clean(Object, boolean) SparkContext.scala:1623
> org.apache.spark.rdd.RDD.reduce(Function2) RDD.scala:883
> org.apache.spark.rdd.RDD.takeOrdered(int, Ordering) RDD.scala:1240
> org.apache.spark.api.java.JavaRDDLike$class.takeOrdered(JavaRDDLike, int, Comparator)
JavaRDDLike.scala:586
> org.apache.spark.api.java.AbstractJavaRDDLike.takeOrdered(int, Comparator) JavaRDDLike.scala:46
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message