spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-20270) na.fill will change the values in long or integer when the default value is in double
Date Sun, 09 Apr 2017 08:05:42 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-20270:
------------------------------------

    Assignee: DB Tsai  (was: Apache Spark)

> na.fill will change the values in long or integer when the default value is in double
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-20270
>                 URL: https://issues.apache.org/jira/browse/SPARK-20270
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>            Reporter: DB Tsai
>            Assignee: DB Tsai
>            Priority: Critical
>
> This bug was partially addressed in SPARK-18555, but the root cause isn't completely
solved. This bug is pretty critical since it changes the member id in Long in our application
if the member id can not be represented by Double losslessly when the member id is very big.

> Here is an example how this happens, with
> {code}
>       Seq[(java.lang.Long, java.lang.Double)]((null, 3.14), (9123146099426677101L, null),
>         (9123146560113991650L, 1.6), (null, null)).toDF("a", "b").na.fill(0.2),
> {code}
> the logical plan will be
> {code}
> == Analyzed Logical Plan ==
> a: bigint, b: double
> Project [cast(coalesce(cast(a#232L as double), cast(0.2 as double)) as bigint) AS a#240L,
cast(coalesce(nanvl(b#233, cast(null as double)), 0.2) as double) AS b#241]
> +- Project [_1#229L AS a#232L, _2#230 AS b#233]
>    +- LocalRelation [_1#229L, _2#230]
> {code}.
> Note that even the value is not null, Spark will cast the Long into Double first. Then
if it's not null, Spark will cast it back to Long which results in losing precision. 
> The behavior should be that the original value should not be changed if it's not null,
but Spark will change the value which is wrong.
> With the PR, the logical plan will be 
> {code}
> == Analyzed Logical Plan ==
> a: bigint, b: double
> Project [coalesce(a#232L, cast(0.2 as bigint)) AS a#240L, coalesce(nanvl(b#233, cast(null
as double)), cast(0.2 as double)) AS b#241]
> +- Project [_1#229L AS a#232L, _2#230 AS b#233]
>    +- LocalRelation [_1#229L, _2#230]
> {code}
> which behaves correctly without changing the original Long values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message