spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <>
Subject [jira] [Commented] (SPARK-24642) Add a function which infers schema from a JSON column
Date Wed, 27 Jun 2018 05:13:00 GMT


Reynold Xin commented on SPARK-24642:

Do we want this as an aggregate function? I'm thinking it's better to just take a string and
infers the schema on the string.

How would the query you provide compile if it is an aggregate function?

> Add a function which infers schema from a JSON column
> -----------------------------------------------------
>                 Key: SPARK-24642
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Maxim Gekk
>            Priority: Minor
> Need to add new aggregate function - *infer_schema()*. The function should infer schema
for set of JSON strings. The result of the function is a schema in DDL format (or JSON format).
> One of the use cases is passing output of *infer_schema()* to *from_json()*. Currently,
the from_json() function requires a schema as a mandatory argument. It is possible to infer
schema programmatically in Scala/Python and pass it as the second argument but in SQL it is
not possible. An user has to pass schema as string literal in SQL. The new function should
allow to use it in SQL like in the example:
> {code:sql}
> select from_json(json_col, infer_schema(json_col))
> from json_table;
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message