hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takeshi Yamamuro (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVEMALL-60) java.io.NotSerializableException if you call each_top_k by using an internal API
Date Thu, 09 Feb 2017 16:00:44 GMT
Takeshi Yamamuro created HIVEMALL-60:
----------------------------------------

             Summary: java.io.NotSerializableException if you call each_top_k by using an
internal API
                 Key: HIVEMALL-60
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-60
             Project: Hivemall
          Issue Type: Bug
            Reporter: Takeshi Yamamuro


If you say code below, you get an exception;

{code}
val df = spark.range(10).selectExpr(s"id % 3 AS key", "rand() AS x", "CAST(id AS STRING) AS
value")
val resultDf = df.each_top_k(lit(100), $"x".as("score"), $"key")

// Run these operations above by kicking an inernal API
resultDf.queryExecution.executedPlan.execute().foreach(x => {})

Caused by: java.io.NotSerializableException: scala.collection.Iterator$$anon$12
Serialization stack:
        - object not serializable (class: scala.collection.Iterator$$anon$12, value: empty
iterator)
        - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: $outer, type:
interface scala.collection.Iterator)
        - object (class scala.collection.Iterator$$anonfun$toStream$1, <function0>)
        - writeObject data (class: scala.collection.immutable.List$SerializationProxy)
        - object (class scala.collection.immutable.List$SerializationProxy, scala.collection.immutable.List$SerializationProxy@4c4ec306)
        - writeReplace data (class: scala.collection.immutable.List$SerializationProxy)
        - object (class scala.collection.immutable.$colon$colon, List(org.apache.spark.OneToOneDependency@434fbf49))
        - field (class: org.apache.spark.rdd.RDD, name: org$apache$spark$rdd$RDD$$dependencies_,
type: interface scala.collection.Seq)
        - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[7] at execute
at <console>:31)
        - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
        - object (class scala.Tuple2, (MapPartitionsRDD[7] at execute at <console>:31,<function2>))
  at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
  at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
  at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
{code}

In most cases, users do not call this in this way though, it'd better to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message