spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Alexander Spitzer (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-11415) Catalyst DateType Shifts Input Data by Local Timezone
Date Fri, 30 Oct 2015 04:11:27 GMT
Russell Alexander Spitzer created SPARK-11415:
-------------------------------------------------

             Summary: Catalyst DateType Shifts Input Data by Local Timezone
                 Key: SPARK-11415
                 URL: https://issues.apache.org/jira/browse/SPARK-11415
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.0
            Reporter: Russell Alexander Spitzer


I've been running type tests for the Spark Cassandra Connector and couldn't get a consistent
result for java.sql.Date. I investigated and noticed the following code is used to create
Catalyst.DateTypes

https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L139-L144
{code}
 /**
   * Returns the number of days since epoch from from java.sql.Date.
   */
  def fromJavaDate(date: Date): SQLDate = {
    millisToDays(date.getTime)
  }
{code}

But millisToDays does not abide by this contract, shifting the underlying timestamp to the
local timezone before calculating the days from epoch. This causes the invocation to move
the actual date around.

{code}
  // we should use the exact day as Int, for example, (year, month, day) -> day
  def millisToDays(millisUtc: Long): SQLDate = {
    // SPARK-6785: use Math.floor so negative number of days (dates before 1970)
    // will correctly work as input for function toJavaDate(Int)
    val millisLocal = millisUtc + threadLocalLocalTimeZone.get().getOffset(millisUtc)
    Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt
  }
{code}

The inverse function also incorrectly shifts the timezone
{code}
  // reverse of millisToDays
  def daysToMillis(days: SQLDate): Long = {
    val millisUtc = days.toLong * MILLIS_PER_DAY
    millisUtc - threadLocalLocalTimeZone.get().getOffset(millisUtc)
  }

{code}
https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L81-L93

This will cause 1-off errors and could cause significant shifts in data if the underlying
data is worked on in different timezones than UTC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message