spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-8905) Spark Streaming receiving socket data sporadically on Mesos 0.22.1
Date Wed, 08 Jul 2015 19:57:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619256#comment-14619256
] 

Brandon Bradley commented on SPARK-8905:
----------------------------------------

I also tested Spark 1.3.1 against Mesos 0.22.1. Both examples are still exhibiting the same
behavior.

> Spark Streaming receiving socket data sporadically on Mesos 0.22.1
> ------------------------------------------------------------------
>
>                 Key: SPARK-8905
>                 URL: https://issues.apache.org/jira/browse/SPARK-8905
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Brandon Bradley
>
> Hello!
> When submitting PySpark Streaming job network_wordcount.py from the examples to a Mesos
cluster, I am encountering a few 'Return message: null' and many py4j 'Connection channel'
errors after. If I let the job run for a long time, there are more py4j errors than my ~2000
line buffer can hold.
> {code}
> 15/07/08 14:27:16 ERROR JobScheduler: Error running job streaming job 1436383595000 ms.0
> py4j.Py4JException: An exception was raised by the Python Proxy. Return Message: null
>         at py4j.Protocol.getReturnValue(Protocol.java:417)
>         at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:113)
>         at com.sun.proxy.$Proxy14.call(Unknown Source)
>         at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/07/08 14:27:16 ERROR JobScheduler: Error running job streaming job 1436383596000 ms.0
> py4j.Py4JException: An exception was raised by the Python Proxy. Return Message: null
>         at py4j.Protocol.getReturnValue(Protocol.java:417)
>         at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:113)
>         at com.sun.proxy.$Proxy14.call(Unknown Source)
>         at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/07/08 14:27:16 ERROR JobScheduler: Error running job streaming job 1436383597000 ms.0
> py4j.Py4JException: An exception was raised by the Python Proxy. Return Message: null
>         at py4j.Protocol.getReturnValue(Protocol.java:417)
>         at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:113)
>         at com.sun.proxy.$Proxy14.call(Unknown Source)
>         at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/07/08 14:27:16 ERROR JobScheduler: Error running job streaming job 1436383598000 ms.0
> py4j.Py4JException: Error while obtaining a new communication channel
>         at py4j.CallbackClient.getConnectionLock(CallbackClient.java:155)
>         at py4j.CallbackClient.sendCommand(CallbackClient.java:229)
>         at py4j.CallbackClient.sendCommand(CallbackClient.java:240)
>         at py4j.CallbackClient.sendCommand(CallbackClient.java:240)
>         at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:111)
>         at com.sun.proxy.$Proxy14.call(Unknown Source)
>         at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:579)
>         at java.net.Socket.connect(Socket.java:528)
>         at java.net.Socket.<init>(Socket.java:425)
>         at java.net.Socket.<init>(Socket.java:241)
>         at py4j.CallbackConnection.start(CallbackConnection.java:104)
>         at py4j.CallbackClient.getConnection(CallbackClient.java:134)
>         at py4j.CallbackClient.getConnectionLock(CallbackClient.java:146)
>         ... 25 more
> 15/07/08 14:27:16 ERROR JobScheduler: Error running job streaming job 1436383599000 ms.0
> py4j.Py4JException: Error while obtaining a new communication channel
>         at py4j.CallbackClient.getConnectionLock(CallbackClient.java:155)
>         at py4j.CallbackClient.sendCommand(CallbackClient.java:229)
>         at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:111)
>         at com.sun.proxy.$Proxy14.call(Unknown Source)
>         at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:579)
>         at java.net.Socket.connect(Socket.java:528)
>         at java.net.Socket.<init>(Socket.java:425)
>         at java.net.Socket.<init>(Socket.java:241)
>         at py4j.CallbackConnection.start(CallbackConnection.java:104)
>         at py4j.CallbackClient.getConnection(CallbackClient.java:134)
>         at py4j.CallbackClient.getConnectionLock(CallbackClient.java:146)
>         ... 23 more
> {code}
> When running Scala Streaming examples, there are no actual errors but any data received
from netcat takes a few minutes to show up in the driver output. I believe these are related.
> Cheers!
> Brandon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message