spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hafthor Stefansson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-23576) SparkSQL - Decimal data missing decimal point
Date Fri, 25 May 2018 21:28:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491267#comment-16491267
] 

Hafthor Stefansson edited comment on SPARK-23576 at 5/25/18 9:27 PM:
---------------------------------------------------------------------

Here's an equivalent problem:

spark.sql("select cast(1 as decimal(38,18)) as x").write.format("parquet").save("decimal.parq")

spark.read.schema(spark.sql("select cast(1 as decimal) as x").schema).parquet("decimal.parq").show

returns 1000000000000000000!

It should throw, like it would if I specified a schema with x as float, or some other type.

Or maybe do what double casting would have

spark.sql("select cast(cast(1 as decimal(38,10)) as decimal(38,18)) as x").show

returns 1.000000000000000000

except, I'd be worried about getting nulls when exceeding the range

spark.sql("select cast(cast(10 as decimal(2,0)) as decimal(2,1)) as x").show

returns null!

[https://gist.github.com/Hafthor/7f12bdfc41dc96676df03f366ef76f1c]


was (Author: hafthor):
Here's an equivalent problem:

spark.sql("select cast(1 as decimal(38,18)) as x").write.format("parquet").save("decimal.parq")

spark.read.schema(spark.sql("select cast(1 as decimal) as x").schema).parquet("decimal.parq").show

returns 1000000000000000000!

It should throw, like it would if I specified a schema with x as float, or some other type.

Or maybe do what double casting would have

spark.sql("select cast(cast(1 as decimal(38,10)) as decimal(38,18)) as x").show

returns 1.000000000000000000

spark.sql("select cast(cast(10 as decimal(2,0)) as decimal(2,1)) as x").show

returns null!

[https://gist.github.com/Hafthor/7f12bdfc41dc96676df03f366ef76f1c]

> SparkSQL - Decimal data missing decimal point
> ---------------------------------------------
>
>                 Key: SPARK-23576
>                 URL: https://issues.apache.org/jira/browse/SPARK-23576
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: spark 2.3.0
> linux
>            Reporter: R
>            Priority: Major
>
> Integers like 3 stored as a decimal display in sparksql as 30000000000 with no decimal
point. But hive displays fine as 3.
> Repro steps:
>  # Create a .csv with the value 3
>  # Use spark to read the csv, cast it as decimal(31,8) and output to an ORC file
>  # Use spark to read the ORC, infer the schema (it will infer 38,18 precision) and output
to a Parquet file
>  # Create external hive table to read the parquet ( define the hive type as decimal(31,8))
>  # Use spark-sql to select from the external hive table.
>  # Notice how sparksql shows 30000000000    !!!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message