spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Velu nambi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-10361) model.predictAll() fails at user_product.first()
Date Mon, 31 Aug 2015 07:22:46 GMT
Velu nambi created SPARK-10361:
----------------------------------

             Summary: model.predictAll() fails at user_product.first()
                 Key: SPARK-10361
                 URL: https://issues.apache.org/jira/browse/SPARK-10361
             Project: Spark
          Issue Type: Bug
          Components: MLlib, PySpark
    Affects Versions: 1.4.1, 1.3.1, 1.5.0
         Environment: Windows 10, Python 2.7 and with all the three versions of Spark
            Reporter: Velu nambi


This code, adapted from the documentation, fails when calling PredictAll() after an ALS.train()


15/08/31 00:11:45 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(Unknown Source)
	at java.net.SocketOutputStream.write(Unknown Source)
	at java.io.BufferedOutputStream.write(Unknown Source)
	at java.io.DataOutputStream.write(Unknown Source)
	at java.io.FilterOutputStream.write(Unknown Source)
	at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
	at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
15/08/31 00:11:45 ERROR PythonRDD: This may have been caused by a prior exception:
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(Unknown Source)
	at java.net.SocketOutputStream.write(Unknown Source)
	at java.io.BufferedOutputStream.write(Unknown Source)
	at java.io.DataOutputStream.write(Unknown Source)
	at java.io.FilterOutputStream.write(Unknown Source)
	at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
	at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
15/08/31 00:11:45 ERROR Executor: Exception in task 0.0 in stage 187.0 (TID 85)
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(Unknown Source)
	at java.net.SocketOutputStream.write(Unknown Source)
	at java.io.BufferedOutputStream.write(Unknown Source)
	at java.io.DataOutputStream.write(Unknown Source)
	at java.io.FilterOutputStream.write(Unknown Source)
	at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
	at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
15/08/31 00:11:45 WARN TaskSetManager: Lost task 0.0 in stage 187.0 (TID 85, localhost): java.net.SocketException:
Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(Unknown Source)
	at java.net.SocketOutputStream.write(Unknown Source)
	at java.io.BufferedOutputStream.write(Unknown Source)
	at java.io.DataOutputStream.write(Unknown Source)
	at java.io.FilterOutputStream.write(Unknown Source)
	at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
	at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)

15/08/31 00:11:45 ERROR TaskSetManager: Task 0 in stage 187.0 failed 1 times; aborting job
15/08/31 00:11:45 INFO TaskSchedulerImpl: Removed TaskSet 187.0, whose tasks have all completed,
from pool 
15/08/31 00:11:45 INFO TaskSchedulerImpl: Cancelling stage 187
15/08/31 00:11:45 INFO DAGScheduler: ResultStage 187 (runJob at PythonRDD.scala:366) failed
in 0.427 s
15/08/31 00:11:45 INFO DAGScheduler: Job 16 failed: runJob at PythonRDD.scala:366, took 0.434198
s
Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 4.5.3\helpers\pydev\pydevd.py",
line 2358, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 4.5.3\helpers\pydev\pydevd.py",
line 1778, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:/Users/thirumalaiv1/PycharmProjects/MovieLensALS/MovieLensALS.py", line 137, in
<module>
    validationRmse = computeRmse(model, validation, numValidation)
  File "C:/Users/thirumalaiv1/PycharmProjects/MovieLensALS/MovieLensALS.py", line 49, in computeRmse
    predictions = model.predictAll(data).map(lambda x: ((x[0], x[1]), x[2]))
  File "C:\spark-1.4.1\python\pyspark\mllib\recommendation.py", line 126, in predictAll
    first = user_product.first()
  File "C:\spark-1.4.1\python\pyspark\rdd.py", line 1295, in first
    rs = self.take(1)
  File "C:\spark-1.4.1\python\pyspark\rdd.py", line 1277, in take
    res = self.context.runJob(self, takeUpToNumLeft, p, True)
  File "C:\spark-1.4.1\python\pyspark\context.py", line 897, in runJob
    allowLocal)
  File "C:\Users\thirumalaiv1\PyCharmVirtualEnv\MovieLensALSVirtEnv\lib\site-packages\py4j\java_gateway.py",
line 813, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "C:\Users\thirumalaiv1\PyCharmVirtualEnv\MovieLensALSVirtEnv\lib\site-packages\py4j\protocol.py",
line 308, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 187.0
failed 1 times, most recent failure: Lost task 0.0 in stage 187.0 (TID 85, localhost): java.net.SocketException:
Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(Unknown Source)
	at java.net.SocketOutputStream.write(Unknown Source)
	at java.io.BufferedOutputStream.write(Unknown Source)
	at java.io.DataOutputStream.write(Unknown Source)
	at java.io.FilterOutputStream.write(Unknown Source)
	at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
	at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
	at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message