spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evelyn Bayes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24362) SUM function precision issue
Date Mon, 25 Jun 2018 04:06:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16521831#comment-16521831
] 

Evelyn Bayes commented on SPARK-24362:
--------------------------------------

Yeah this has to do with the precision problems of the Double type in the JVM, you'll get
the same with:
{code:java}
bin/spark-shell

scala> val df = spark.range(1).toDF("c1")
df: org.apache.spark.sql.DataFrame = [c1: bigint]

scala> df.selectExpr("49.95 + cast(9.99 as double)").show()
+------------------------------+
|(49.95 + CAST(9.99 AS DOUBLE))|
+------------------------------+
| 59.940000000000005|
+------------------------------+
{code}
This is part of the reason you aren't meant to use DOUBLE or FLOAT for precision calculations.
Is this really a bug or is it just expected behaviour given Spark uses the JVM?

> SUM function precision issue
> ----------------------------
>
>                 Key: SPARK-24362
>                 URL: https://issues.apache.org/jira/browse/SPARK-24362
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Yuming Wang
>            Priority: Major
>
>  How to reproduce:
> {noformat}
> bin/spark-shell --conf spark.sql.autoBroadcastJoinThreshold=-1
> scala> val df = spark.range(6).toDF("c1")
> df: org.apache.spark.sql.DataFrame = [c1: bigint]
> scala> df.join(df, "c1").selectExpr("sum(cast(9.99 as double))").show()
> +-------------------------+
> |sum(CAST(9.99 AS DOUBLE))|
> +-------------------------+
> |       59.940000000000005|
> +-------------------------+{noformat}
>  
> More links:
> [https://stackoverflow.com/questions/42158844/about-a-loss-of-precision-when-calculating-an-aggregate-sum-with-data-frames]
> [https://stackoverflow.com/questions/44134497/spark-sql-sum-function-issues-on-double-value]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message