spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: sql timestamp timezone bug
Date Fri, 18 Mar 2016 21:22:31 GMT
In Spark SQL, timestamp is the number of micro seconds since epoch, so
it has nothing with timezone.

When you compare it again unix_timestamp or string, it's better to
convert these into timestamp then compare them.

In your case, the where clause should be:

where (created > cast('{0}' as timestamp)) and (created <=  cast('{1}'
as timestamp))

Could you try this?

On Fri, Mar 18, 2016 at 11:10 AM, Andy Davidson
<Andy@santacruzintegration.com> wrote:
> Hi Davies
>
>
>
> What's the type of `created`? TimestampType?
>
>
>
> The ‘created’ column in cassandra is a timestamp
> https://docs.datastax.com/en/cql/3.0/cql/cql_reference/timestamp_type_r.html
>
> In the spark data frame it is a a timestamp
>
>
> If yes, when created is compared to a string, it will be casted into
> string, then compared as string, it become
>
> cast(created, as string) > '2016-03-12 00:30:00+0000'
>
> Could you try this
>
> sqlCtx.sql("select created, cast(created as string) from rawTable").show()
>
>
>
> I am note sure I under stand your suggestion. In my where clause the date
> range is specified using string literals. I need the value of created to be
> a time stamps
>
> # http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html
> stmnt = "select \
>             row_key, created,  cast(created as string), count,
> unix_timestamp(created) as unixTimeStamp, \
>             unix_timestamp(created, 'yyyy-MM-dd HH:mm:ss.zz') as aedwip, \
>             to_utc_timestamp(created, 'gmt') as gmt \
>          from \
>             rawTable \
>          where \
>              (created > '{0}') and (created <= '{1}') \
>              and \
>              (row_key = ‘red' \
>                         or row_key = ‘blue' )".format('2016-03-12
> 00:30:00+0000', '2016-03-12 04:30:00+0000')
>
> rawDF = sqlContext.read\
>     .format("org.apache.spark.sql.cassandra")\
>     .options(table="json_timeseries", keyspace="notification")\
>     .load()
> rawDF.registerTempTable(tmpTableName)
> rawDF = sqlCtx.sql(stmnt).cache()
>
>
> The time stamps are still not UTC they are in PST
>
> root
>  |-- row_key: string (nullable = true)
>  |-- created: timestamp (nullable = true)
>  |-- created: string (nullable = true)
>  |-- count: long (nullable = true)
>  |-- unixTimeStamp: long (nullable = true)
>  |-- aedwip: long (nullable = true)
>  |-- gmt: timestamp (nullable = true)
>
> +-------------+---------------------+-------------------+-----+-------------+----------+---------------------+
> |row_key      |created              |created
> |count|unixTimeStamp|aedwip    |gmt                  |
> +-------------+---------------------+-------------------+-----+-------------+----------+---------------------+
> |blue         |2016-03-12 00:30:30.0|2016-03-12 00:30:30|2    |1457771430
> |1457771430|2016-03-12 00:30:30.0|
> |blue         |2016-03-12 00:30:45.0|2016-03-12 00:30:45|1    |1457771445
> |1457771445|2016-03-12 00:30:45.0|
> |blue         |2016-03-12 00:31:00.0|2016-03-12 00:31:00|1    |1457771460
> |1457771460|2016-03-12 00:31:00.0|
> |
>
>
> Kind regards
>
> Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message