spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-31557) Legacy parser incorrectly interprets pre-Gregorian dates
Date Thu, 30 Apr 2020 04:24:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-31557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096116#comment-17096116
] 

Apache Spark commented on SPARK-31557:
--------------------------------------

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28408

> Legacy parser incorrectly interprets pre-Gregorian dates
> --------------------------------------------------------
>
>                 Key: SPARK-31557
>                 URL: https://issues.apache.org/jira/browse/SPARK-31557
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Major
>             Fix For: 3.0.0
>
>
> With CSV:
> {noformat}
> scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
> res0: org.apache.spark.sql.DataFrame = [key: string, value: string]
> scala> val seq = Seq("0002-01-01", "1000-01-01", "1500-01-01", "1800-01-01").map(x
=> s"$x,$x")
> seq: Seq[String] = List(0002-01-01,0002-01-01, 1000-01-01,1000-01-01, 1500-01-01,1500-01-01,
1800-01-01,1800-01-01)
> scala> val ds = seq.toDF("value").as[String]
> ds: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> spark.read.schema("expected STRING, actual DATE").csv(ds).show
> +----------+----------+
> |  expected|    actual|
> +----------+----------+
> |0002-01-01|0001-12-30|
> |1000-01-01|1000-01-06|
> |1500-01-01|1500-01-10|
> |1800-01-01|1800-01-01|
> +----------+----------+
> scala> 
> {noformat}
> Similarly, with JSON:
> {noformat}
> scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
> res0: org.apache.spark.sql.DataFrame = [key: string, value: string]
> scala> val seq = Seq("0002-01-01", "1000-01-01", "1500-01-01", "1800-01-01").map {
x =>
>   s"""{"expected": "$x", "actual": "$x"}"""
> }
>      |      | seq: Seq[String] = List({"expected": "0002-01-01", "actual": "0002-01-01"},
{"expected": "1000-01-01", "actual": "1000-01-01"}, {"expected": "1500-01-01", "actual": "1500-01-01"},
{"expected": "1800-01-01", "actual": "1800-01-01"})
> scala> 
> scala> val ds = seq.toDF("value").as[String]
> ds: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> spark.read.schema("expected STRING, actual DATE").json(ds).show
> +----------+----------+
> |  expected|    actual|
> +----------+----------+
> |0002-01-01|0001-12-30|
> |1000-01-01|1000-01-06|
> |1500-01-01|1500-01-10|
> |1800-01-01|1800-01-01|
> +----------+----------+
> scala> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message