spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Hang on Executor classloader lookup for the remote REPL URL classloader
Date Wed, 07 Jan 2015 08:31:49 GMT
Hey Andrew,

So the executors in Spark will fetch classes from the driver node for
classes defined in the repl from an HTTP server on the driver. Is this
happening in the context of a repl session? Also, is it deterministic
or does it happen only periodically?

The reason all of the other threads are hanging is that there is a
global lock around classloading, so they all queue up.

Could you attach the full stack trace from the driver? Is it possible
that something in the network is blocking the transfer of bytes
between these two processes? Based on the stack trace it looks like it
sent an HTTP request and is waiting on the result back from the
driver.

One thing to check is to verify that the TCP connection between them
used for the repl class server is still alive from the vantage point
of both the executor and driver nodes. Another thing to try would be
to temporarily open up any firewalls that are on the nodes or in the
network and see if this makes the problem go away (to isolate it to an
exogenous-to-Spark network issue).

- Patrick

On Wed, Aug 20, 2014 at 11:35 PM, Andrew Ash <andrew@andrewash.com> wrote:
> Hi Spark devs,
>
> I'm seeing a stacktrace where the classloader that reads from the REPL is
> hung, and blocking all progress on that executor.  Below is that hung
> thread's stacktrace, and also the stacktrace of another hung thread.
>
> I thought maybe there was an issue with the REPL's JVM on the other side,
> but didn't see anything useful in that stacktrace either.
>
> Any ideas what I should be looking for?
>
> Thanks!
> Andrew
>
>
> "Executor task launch worker-0" daemon prio=10 tid=0x00007f780c208000
> nid=0x6ae9 runnable [0x00007f78c2eeb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:152)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         - locked <0x00007f7e13ea9560> (a java.io.BufferedInputStream)
>         at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>         at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>         at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>         - locked <0x00007f7e13e9eeb0> (a
> sun.net.www.protocol.http.HttpURLConnection)
>         at java.net.URL.openStream(URL.java:1037)
>         at
> org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:86)
>         at
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:63)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         - locked <0x00007f7fc9018980> (a
> org.apache.spark.repl.ExecutorClassLoader)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:270)
>         at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:102)
>         at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:82)
>         at
> org.apache.avro.specific.SpecificData.getClass(SpecificData.java:132)
>         at
> org.apache.avro.specific.SpecificDatumReader.setSchema(SpecificDatumReader.java:69)
>         at
> org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:126)
>         at
> org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>         at
> org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:59)
>         at
> org.apache.avro.mapred.AvroRecordReader.<init>(AvroRecordReader.java:41)
>         at
> org.apache.avro.mapred.AvroInputFormat.getRecordReader(AvroInputFormat.java:71)
>         at
> org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:193)
>         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:184)
>         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
>
> And the other threads are stuck on the Class.forName0() method too:
>
> "Executor task launch worker-4" daemon prio=10 tid=0x00007f780c20f000
> nid=0x6aed waiting for monitor entry [0x00007f78c2ae8000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:270)
>         at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:102)
>         at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:79)
>         at
> org.apache.avro.specific.SpecificData.getClass(SpecificData.java:132)
>         at
> org.apache.avro.specific.SpecificDatumReader.setSchema(SpecificDatumReader.java:69)
>         at
> org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:126)
>         at
> org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>         at
> org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:59)
>         at
> org.apache.avro.mapred.AvroRecordReader.<init>(AvroRecordReader.java:41)
>         at
> org.apache.avro.mapred.AvroInputFormat.getRecordReader(AvroInputFormat.java:71)
>         at
> org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:193)
>         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:184)
>         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
> asdf

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message