spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25226) Extend functionality of from_json to support arrays of differently-typed elements
Date Mon, 27 Aug 2018 15:24:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593819#comment-16593819
] 

Hyukjin Kwon commented on SPARK-25226:
--------------------------------------

This is fixed in the current master:

{code}
>>> df = df.withColumn("parsed_data", F.from_json(F.col('data'),
...     ArrayType(StringType()))) # Does not work, because not a struct of array of structs
>>> df.show()
+--------------------+---+--------------------+
|                data| id|         parsed_data|
+--------------------+---+--------------------+
|["string1", true,...|  1|    [string1, true,]|
|["string2", false...|  2|   [string2, false,]|
|["string3", true,...|  3|[string3, true, a...|
+--------------------+---+--------------------+
{code}

Catalog string is preferred over JSON string support, which should rather be deprecated.

> Extend functionality of from_json to support arrays of differently-typed elements
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-25226
>                 URL: https://issues.apache.org/jira/browse/SPARK-25226
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Yuriy Davygora
>            Priority: Minor
>
> At the moment, the 'from_json' function only supports a STRUCT or an ARRAY of STRUCTS
as input. Support for ARRAY of primitives is, apparently, coming with Spark 2.4, but it will
only support arrays of elements of same data type. It will not, for example, support JSON-arrays
like
> {noformat}
> ["string_value", 0, true, null]
> {noformat}
> which is JSON-valid with schema
> {noformat}
> {"containsNull":true,"elementType":["string","integer","boolean"],"type":"array"}
> {noformat}
> We would like to kindly ask you to add support for different-typed element arrays in
the 'from_json' function. This will necessitate extending the functionality of ArrayType or
maybe adding a new type (refer to [[SPARK-25225]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message