spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-24489) No check for invalid input type of weight data in ml.PowerIterationClustering
Date Thu, 07 Jun 2018 21:18:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-24489:
------------------------------------

    Assignee: Apache Spark

> No check for invalid input type of weight data in ml.PowerIterationClustering
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-24489
>                 URL: https://issues.apache.org/jira/browse/SPARK-24489
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: shahid
>            Assignee: Apache Spark
>            Priority: Major
>             Fix For: 2.4.0
>
>
> The test case will result the following failure. currently in ml.PIC, there is no check
for the data type of weight column. We should check for the valid data type of the weight.
> {code:java}
>   test("invalid input types for weight") {
>     val invalidWeightData = spark.createDataFrame(Seq(
>       (0L, 1L, "a"),
>       (2L, 3L, "b")
>     )).toDF("src", "dst", "weight")
>     val pic = new PowerIterationClustering()
>       .setWeightCol("weight")
>     val result = pic.assignClusters(invalidWeightData)
>   }
> {code}
> {code:java}
> Job aborted due to stage failure: Task 0 in stage 8077.0 failed 1 times, most recent
failure: Lost task 0.0 in stage 8077.0 (TID 882, localhost, executor driver): scala.MatchError:
[0,1,null] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
> 	at org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178)
> 	at org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178)
> 	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> 	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> 	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> 	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> 	at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107)
> 	at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message