spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teng Peng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24391) to_json/from_json should support arrays of primitives, and more generally all JSON
Date Sat, 26 May 2018 17:28:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491745#comment-16491745
] 

Teng Peng commented on SPARK-24391:
-----------------------------------

My plan is to follow the Spark-19849 & Spark-21513 to support more primitive types. I
will start with StringType to see how it goes.

> to_json/from_json should support arrays of primitives, and more generally all JSON 
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-24391
>                 URL: https://issues.apache.org/jira/browse/SPARK-24391
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Sam Kitajima-Kimbrel
>            Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-19849 and https://issues.apache.org/jira/browse/SPARK-21513 brought
support for more column types to functions.to_json/from_json, but I also have cases where
I'd like to simply (de)serialize an array of primitives to/from JSON when outputting to certain
destinations, which does not work:
> {code:java}
> scala> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.functions._
> scala> import spark.implicits._
> import spark.implicits._
> scala> val df = Seq("[1, 2, 3]").toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: string]
> scala> val schema = new ArrayType(IntegerType, false)
> schema: org.apache.spark.sql.types.ArrayType = ArrayType(IntegerType,false)
> scala> df.select(from_json($"a", schema))
> org.apache.spark.sql.AnalysisException: cannot resolve 'jsontostructs(`a`)' due to data
type mismatch: Input schema array<int> must be a struct or an array of structs.;;
> 'Project [jsontostructs(ArrayType(IntegerType,false), a#3, Some(America/Los_Angeles))
AS jsontostructs(a)#10]
> scala> val arrayDf = Seq(Array(1, 2, 3)).toDF("arr")
> arrayDf: org.apache.spark.sql.DataFrame = [arr: array<int>]
> scala> arrayDf.select(to_json($"arr"))
> org.apache.spark.sql.AnalysisException: cannot resolve 'structstojson(`arr`)' due to
data type mismatch: Input type array<int> must be a struct, array of structs or a map
or array of map.;;
> 'Project [structstojson(arr#19, Some(America/Los_Angeles)) AS structstojson(arr)#26]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message