spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee Dongjin (JIRA)" <>
Subject [jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders
Date Thu, 09 Mar 2017 15:50:38 GMT


Lee Dongjin commented on SPARK-17248:

[~pdxleif] // Although it may be an expired question, let me answer.

There are two ways of implementing Enum types in Scala. For more information, see:
The 'enumeration object' in the tuning guide seems to point the second way, that is, using
sealed traits and case objects.

However, it seems like the sentence "Consider using numeric IDs or enumeration objects instead
of strings for keys." in the tuning guide does not apply to DataSet, like your case. From
my humble knowledge, DataSet supports only case classes with primitive types, not with the
other case class or object, in this case, the enum objects.

If you want some workaround, please check out this example:
It shows an example of DataSet using one of the public datasets. Please pay attention to how
I matched the Passenger case class and its corresponding SchemaType - age, pClass, sex and

[~srowen] // It seems like lots of users are experiencing similar problems. How about changing
this issue into providing more examples and explanations in official documentation? I needed,
I would like to take the issue.

> Add native Scala enum support to Dataset Encoders
> -------------------------------------------------
>                 Key: SPARK-17248
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Silvio Fiorito
> Enable support for Scala enums in Encoders. Ideally, users should be able to use enums
as part of case classes automatically.
> Currently, this code...
> {code}
> object MyEnum extends Enumeration {
>   type MyEnum = Value
>   val EnumVal1, EnumVal2 = Value
> }
> case class MyClass(col: MyEnum.MyEnum)
> val data = Seq(MyClass(MyEnum.EnumVal1), MyClass(MyEnum.EnumVal2)).toDS()
> {code}
> ...results in this stacktrace:
> {code}
> ava.lang.UnsupportedOperationException: No Encoder found for MyEnum.MyEnum
> - field (class: "scala.Enumeration.Value", name: "col")
> - root class: "line550c9f34c5144aa1a1e76bcac863244717.$read.$iwC.$iwC.$iwC.$iwC.MyClass"
> 	at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.immutable.List.foreach(List.scala:318)
> 	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> 	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425)
> 	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
> 	at org.apache.spark.sql.Encoders$.product(Encoders.scala:274)
> 	at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47)
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message