spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Morozov <evgeny.a.moro...@gmail.com>
Subject DataFrame Explode for ArrayBuffer[Any]
Date Sat, 10 Oct 2015 14:06:00 GMT
Hi,

I have a DataFrame with several columns I'd like to explode. All of the
columns I have to explode has an ArrayBuffer type of some other types
inside.
I'd say that the following code is totally legit to use it as explode
function for any given ArrayBuffer - my assumption is that for any given
row with a column that has a collection it will produce several rows with
specific objects in that column:

dataFrame.explode(inputColumn, outputColumn) { a: ArrayBuffer[Any] => a }

But instead I've got exception with
java.lang.UnsupportedOperationException: Schema for type Any is not
supported
at
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:153)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
at
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:64)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
at org.apache.spark.sql.DataFrame.explode(DataFrame.scala:1116)

Is there any way I could do that what I want?


I'm not really good at scala, yet, so if I know the exact type of
particular ArrayBuffer's element, how can I specify it instead of Any?
Let's say I have the following:
val dataType = ..., then how can I use it to use explode?
dataFrame.explode(inputColumn, outputColumn) { a: ArrayBuffer[  /* dataType
*/  ] => a }

Thank you in advance.
--
Be well!
Jean Morozov

Mime
View raw message