spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22395) Fix the behavior of timestamp values for Pandas to respect session timezone
Date Mon, 06 Nov 2017 14:54:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240396#comment-16240396
] 

Apache Spark commented on SPARK-22395:
--------------------------------------

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/19674

> Fix the behavior of timestamp values for Pandas to respect session timezone
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-22395
>                 URL: https://issues.apache.org/jira/browse/SPARK-22395
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.3.0
>            Reporter: Takuya Ueshin
>
> When converting Pandas DataFrame/Series from/to Spark DataFrame using {{toPandas()}}
or pandas udfs, timestamp values behave to respect Python system timezone instead of session
timezone.
> For example, let's say we use {{"America/Los_Angeles"}} as session timezone and have
a timestamp value {{"1970-01-01 00:00:01"}} in the timezone. Btw, I'm in Japan so Python timezone
would be {{"Asia/Tokyo"}}.
> The timestamp value from current {{toPandas()}} will be the following:
> {noformat}
> >>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> >>> df = spark.createDataFrame([28801], "long").selectExpr("timestamp(value)
as ts")
> >>> df.show()
> +-------------------+
> |                 ts|
> +-------------------+
> |1970-01-01 00:00:01|
> +-------------------+
> >>> df.toPandas()
>                    ts
> 0 1970-01-01 17:00:01
> {noformat}
> As you can see, the value becomes {{"1970-01-01 17:00:01"}} because it respects Python
timezone.
> As we discussed in https://github.com/apache/spark/pull/18664, we consider this behavior
is a bug and the value should be {{"1970-01-01 00:00:01"}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message