spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Giorgio Massignani (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-18350) Support session local timezone
Date Thu, 23 Mar 2017 11:22:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938096#comment-15938096
] 

Giorgio Massignani edited comment on SPARK-18350 at 3/23/17 11:22 AM:
----------------------------------------------------------------------

I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it hasn't changes about
it, we did in the `spark 1.6.1` with scala.

In our case, we are creating `StructType` and `StructField` programatically creating DataFrames
from RDDs.

The first problem with the TimeZones are, how to send the TimeZone embedded into a Timestamp
column?

My workaround was creating the a new type `TimestampTz` which has UserDefinedType and Kryo
serialisers.
```
@SQLUserDefinedType(udt = classOf[TimestampTzUdt])
@DefaultSerializer(classOf[TimestampTzKryo])
class TimestampTz(val time: Long, val timeZoneId:String)
```
The second problem, how to customise spark when it is call `PreparedStatement.setXXX`?

It makes me create a new `DataFrameWriter` duplicating the code because it is a `final class`

With a `CustomDataFrameWriter` it has to to call the `JdbcUtils` where the customisation should
be done.

We created a `CustomJdbcUtils` which is a Proxy of ``JdbcUtils` but with a change only where
it call the `PreparedStatement.setTimestamp`
````
 case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
```
It would be perfect if the oracle driver worked as we expected, sending the timezone to the
column.

However, to work, we need to call a specific oracle class.
````
case TimestampTzUdt =>
   val timestampTz = row.getAs[TimestampTz](i)
   val cal = timestampTz.getCalendar
   if (isOracle)
     stmt.setObject(i + 1, new oracle.sql.TIMESTAMPTZ(conn, new java.sql.Timestamp(timestampTz.time),
cal))
   else
    stmt.setTimestamp(i + 1, new java.sql.Timestamp(timestampTz.time), cal)
````


In resume, what we expected from Spark? 

Creating some shortcuts to make easier customise spark sql for these cases.


was (Author: giorgio_sonra):
I'd like to share what we did to solve the oracle `TIMESTAMP WITH TIME ZONE`
We are looking for to upgrade to the latest spark version, but because it hasn't changes about
it we did in the `spark 1.6.1`.

In our case, we are creating `StructType` and `StructField` programatically creating DataFrames
from RDDs.

> Support session local timezone
> ------------------------------
>
>                 Key: SPARK-18350
>                 URL: https://issues.apache.org/jira/browse/SPARK-18350
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Takuya Ueshin
>              Labels: releasenotes
>             Fix For: 2.2.0
>
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime manipulation, which
is bad if users are not in the same timezones as the machines, or if different users have
different timezones.
> We should introduce a session local timezone setting that is used for execution.
> An explicit non-goal is locale handling.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message