spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-23772) Provide an option to ignore column of all null values or empty map/array during JSON schema inference
Date Thu, 22 Mar 2018 16:48:00 GMT
Xiangrui Meng created SPARK-23772:
-------------------------------------

             Summary: Provide an option to ignore column of all null values or empty map/array
during JSON schema inference
                 Key: SPARK-23772
                 URL: https://issues.apache.org/jira/browse/SPARK-23772
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Xiangrui Meng


It is common that we convert data from JSON source to structured format periodically. In the
initial batch of JSON data, if a field's values are always null, Spark infers this field as
StringType. However, in the second batch, one non-null value appears in this field and its
type turns out to be not StringType. Then merge schema failed because schema inconsistency.

This also applies to empty arrays and empty objects. My proposal is providing an option in
Spark JSON source to omit those fields until we see a non-null value.

This is similar to SPARK-12436 but the proposed solution is different.

cc: [~rxin] [~smilegator]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message