Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 24A0F1840D for ; Fri, 21 Aug 2015 08:10:46 +0000 (UTC) Received: (qmail 38719 invoked by uid 500); 21 Aug 2015 08:10:46 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 38694 invoked by uid 500); 21 Aug 2015 08:10:46 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 38608 invoked by uid 99); 21 Aug 2015 08:10:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Aug 2015 08:10:46 +0000 Date: Fri, 21 Aug 2015 08:10:45 +0000 (UTC) From: "Michal Laclavik (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-10115) MLlib ALS training fails with java.lang.ClassCastException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706381#comment-14706381 ] Michal Laclavik commented on SPARK-10115: ----------------------------------------- Yes, that is right, It was not obvious for me from the errors, but when I examined worker logs I found also {code} java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer {code} Sorry for bothering you with this, but I did not realized it from those error messages at first. > MLlib ALS training fails with java.lang.ClassCastException > ---------------------------------------------------------- > > Key: SPARK-10115 > URL: https://issues.apache.org/jira/browse/SPARK-10115 > Project: Spark > Issue Type: Bug > Environment: first experienced on spark 1.2.1 but then also with latest > spark-1.4.1-bin-hadoop2.6 > Reporter: Michal Laclavik > > I am running ALS collaborative filtering training on data which looks as follows (sample by running "user_product.take(10)"): > {code} > [(1205640308657491975, 50233468418, 1.0), > (4743366459073625989, 50233472294, 1.0), > (4743366459073625989, 50233473253, 1.0), > (4743366459073625989, 75586230246, 1.0), > (4743366459073625989, 50233473248, 1.0), > (56766162624422850, 74848929776, 1.0), > (56766162624422850, 50233473397, 1.0), > (56766162624422850, 78185852309, 1.0), > (56766162624422850, 73533710263, 1.0), > (56766162624422850, 78185852319, 1.0)] > {code} > and then I call training on that RDD: > {code} > rank = 12 > iterations=5 > model = ALS.train(user_product, rank, iterations) > {code} > and I get following error: > {code} > --------------------------------------------------------------------------- > Py4JJavaError Traceback (most recent call last) > in () > 2 rank = 12 > 3 iterations=5 > ----> 4 model = ALS.train(user_product, rank, iterations) > /opt/spark/python/pyspark/mllib/recommendation.py in train(cls, ratings, rank, iterations, lambda_, blocks, nonnegative, seed) > 192 seed=None): > 193 model = callMLlibFunc("trainALSModel", cls._prepare(ratings), rank, iterations, > --> 194 lambda_, blocks, nonnegative, seed) > 195 return MatrixFactorizationModel(model) > 196 > /opt/spark/python/pyspark/mllib/common.py in callMLlibFunc(name, *args) > 126 sc = SparkContext._active_spark_context > 127 api = getattr(sc._jvm.PythonMLLibAPI(), name) > --> 128 return callJavaFunc(sc, api, *args) > 129 > 130 > /opt/spark/python/pyspark/mllib/common.py in callJavaFunc(sc, func, *args) > 119 """ Call Java Function """ > 120 args = [_py2java(sc, a) for a in args] > --> 121 return _java2py(sc, func(*args)) > 122 > 123 > /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args) > 536 answer = self.gateway_client.send_command(command) > 537 return_value = get_return_value(answer, self.gateway_client, > --> 538 self.target_id, self.name) > 539 > 540 for temp_arg in temp_args: > /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) > 298 raise Py4JJavaError( > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > 302 raise Py4JError( > Py4JJavaError: An error occurred while calling o448.trainALSModel. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 57.0 failed 1 times, most recent failure: Lost task 9.0 in stage 57.0 (TID 4187, localhost): java.lang.ClassCastException > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) > at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) > at scala.Option.foreach(Option.scala:236) > at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org