spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Odersky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-12648) UDF with Option[Double] throws ClassCastException
Date Wed, 06 Jan 2016 22:22:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086388#comment-15086388
] 

Jakob Odersky commented on SPARK-12648:
---------------------------------------

In spark-shell:
{code}
val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", "weight")
df: org.apache.spark.sql.DataFrame = [name: string, weight: double]
{/code}
You're getting a DataFrame containing doubles, not optional doubles. Not 100% sure, but I'm
guessing that creating a DataFrame from Option types is syntactic sugar to avoid using nulls
in client code. Spark then optimizes the option types to nullable or default values.

> UDF with Option[Double] throws ClassCastException
> -------------------------------------------------
>
>                 Key: SPARK-12648
>                 URL: https://issues.apache.org/jira/browse/SPARK-12648
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Mikael Valot
>
> I can write an UDF that returns an Option[Double], and the DataFrame's  schema is correctly
inferred to be a nullable double. 
> However I cannot seem to be able to write a UDF that takes an Option as an argument:
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.{SparkContext, SparkConf}
> val conf = new SparkConf().setMaster("local[4]").setAppName("test")
> val sc = new SparkContext(conf)
> val sqlc = new SQLContext(sc)
> import sqlc.implicits._
> val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", "weight")
> import org.apache.spark.sql.functions._
> val addTwo = udf((d: Option[Double]) => d.map(_+2)) 
> df.withColumn("plusTwo", addTwo(df("weight"))).show()
> =>
> 2016-01-05T14:41:52 Executor task launch worker-0 ERROR org.apache.spark.executor.Executor
Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.ClassCastException: java.lang.Double cannot be cast to scala.Option
> 	at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:18) ~[na:na]
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source) ~[na:na]
> 	at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
~[spark-sql_2.10-1.6.0.jar:1.6.0]
> 	at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
~[spark-sql_2.10-1.6.0.jar:1.6.0]
> 	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) ~[scala-library-2.10.5.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message