spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-6071) ALS doc example fails randomly in PythonAccumulatorParam
Date Fri, 27 Feb 2015 23:06:07 GMT
Joseph K. Bradley created SPARK-6071:
----------------------------------------

             Summary: ALS doc example fails randomly in PythonAccumulatorParam
                 Key: SPARK-6071
                 URL: https://issues.apache.org/jira/browse/SPARK-6071
             Project: Spark
          Issue Type: Bug
          Components: MLlib, PySpark
    Affects Versions: 1.3.0
            Reporter: Joseph K. Bradley
            Priority: Minor


When running the ALS example in [http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples]
on branch-1.3, I got a random failure which I have been unable to reproduce.

Specifically, I was running on the branch from this PR [https://github.com/apache/spark/pull/4811]
at this commit: [https://github.com/mengxr/spark/commit/06140a48ec5bd55b329e9b7cf658bd3e43be4fe2]

However, that PR should not have affected the bug, so I suspect it is within branch-1.3 itself.

After a clean build, I ran:
{code}
from pyspark.mllib.recommendation import ALS, Rating, MatrixFactorizationModel

# Load and parse the data
data = sc.textFile("data/mllib/als/test.data")
ratings = data.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))

# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 20
model = ALS.train(ratings, rank, numIterations)
{code}

And I got this error:
{code}
>>> model = ALS.train(ratings, rank, numIterations)
15/02/27 14:41:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
15/02/27 14:41:24 WARN LoadSnappy: Snappy native library not loaded
15/02/27 14:41:26 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
15/02/27 14:41:26 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
15/02/27 14:41:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
15/02/27 14:41:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for ResultTask(279, 2)
java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
	at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
	at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
	at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for ResultTask(279, 4)
java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
	at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
	at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
	at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}

However, re-running the same train() call immediately worked, and I have not yet been able
to reproduce the bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message