spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-16216) CSV data source does not write date and timestamp correctly
Date Tue, 19 Jul 2016 12:59:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384107#comment-15384107
] 

Hyukjin Kwon commented on SPARK-16216:
--------------------------------------

Please let me leave my thought here just in case it is helpful. As you all know, it seems
JSON and CSV format do not have a specification for date format. So, this seems rather dependent
on how we define about the default format.

So, for a default format, I would expect it should be able to read back what I write without
extra configuration via Spark. If numeric type is the default, we could not deferenciate it.
I am pretty sure that we will have JIRAs similar with this in the future if the default one
is numeric just like SPARK-16597.

If it is dependent on my decision and I have to choose only one, I would choose the side that
seems less possibly cuasing issues in the future over the downside, ambiguity.

To cut it short, how about matching CSV to JSON with explicit warnnings in documentation or
using, for both, a standard such as [ISO 8601|https://en.m.wikipedia.org/wiki/ISO_8601] which
implies the timezone is UTC?


> CSV data source does not write date and timestamp correctly
> -----------------------------------------------------------
>
>                 Key: SPARK-16216
>                 URL: https://issues.apache.org/jira/browse/SPARK-16216
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> Currently, CSV data source write {{DateType}} and {{TimestampType}} as below:
> {code}
> +----------------+
> |            date|
> +----------------+
> |1440637200000000|
> |1414459800000000|
> |1454040000000000|
> +----------------+
> {code}
> It would be nicer if it write dates and timestamps as a formatted string just like JSON
data sources.
> Also, CSV data source currently supports {{dateFormat}} option to read dates and timestamps
in a custom format. It might be better if this option can be applied in writing as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message