spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-25040) Empty string for double and float types should be nulls in JSON
Date Tue, 23 Oct 2018 05:45:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyukjin Kwon resolved SPARK-25040.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 22787
[https://github.com/apache/spark/pull/22787]

> Empty string for double and float types  should be nulls in JSON
> ----------------------------------------------------------------
>
>                 Key: SPARK-25040
>                 URL: https://issues.apache.org/jira/browse/SPARK-25040
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0, 2.4.0
>            Reporter: Hyukjin Kwon
>            Assignee: Liang-Chi Hsieh
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> The issue itself seems to be a behaviour change between 1.6 and 2.x for treating empty
string as null or not in double and float.
> {code}
> {"a":"a1","int":1,"other":4.4}
> {"a":"a2","int":"","other":""}
> {code}
> code :
> {code}
> val config = new SparkConf().setMaster("local[5]").setAppName("test")
> val sc = SparkContext.getOrCreate(config)
> val sql = new SQLContext(sc)
> val file_path = this.getClass.getClassLoader.getResource("Sanity4.json").getFile
> val df = sql.read.schema(null).json(file_path)
> df.show(30)
> {code}
> then in spark 1.6, result is
> {code}
> +---+----+-----+
> | a| int|other|
> +---+----+-----+
> | a1| 1| 4.4|
> | a2|null| null|
> +---+----+-----+
> {code}
> {code}
> root
> |-- a: string (nullable = true)
> |-- int: long (nullable = true)
> |-- other: double (nullable = true)
> {code}
> but in spark 2.2, result is
> {code}
> +----+----+-----+
> | a| int|other|
> +----+----+-----+
> | a1| 1| 4.4|
> |null|null| null|
> +----+----+-----+
> {code}
> {code}
> root
> |-- a: string (nullable = true)
> |-- int: long (nullable = true)
> |-- other: double (nullable = true)
> {code}
> Another easy reproducer:
> {code}
> spark.read.schema("a DOUBLE, b FLOAT")
>       .option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", """{"a": 1.1, "b":
1.1}""").toDS)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message