spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values
Date Thu, 30 Oct 2014 18:18:33 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190531#comment-14190531
] 

Yin Huai commented on SPARK-4077:
---------------------------------

[~gvramana] Thank you for looking at it. Seems for text source, Hive use LazyTimestamp to
deserialize the value and set it to TimestampWriteable. So, this issue only affects text sources,
right?

[The code|https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java#L130]
mentioned by [~gvramana] is shown as follows.
{code:java}
public void set(Timestamp t) {
  if (t == null) {
    timestamp.setTime(0);
    timestamp.setNanos(0);
    return;
  }
  this.timestamp = t;
  bytesEmpty = true;
  timestampEmpty = false;
}
{code}

btw, why the result of runSqlHive(ask hive to run the query) is not affected?

> A broken string timestamp value can Spark SQL return wrong values for valid string timestamp
values
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-4077
>                 URL: https://issues.apache.org/jira/browse/SPARK-4077
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: Yin Huai
>            Assignee: Venkata Ramana G
>
> The following case returns wrong results.
> The text file is 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11astring00:00:00,2
> {code}
> The DDL statement and the query are shown below...
> {code}
> sql("""
> create external table date_test(my_date timestamp, id int)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> LOCATION 'dateTest'
> """)
> sql("select * from date_test").collect.foreach(println)
> {code}
> The result is 
> {code}
> [1969-12-31 19:00:00.0,1]
> [null,2]
> {code}
> If I change the data to 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11 00:00:00,2
> {code}
> The result is fine.
> For the data with broken string timestamp value, I tried runSqlHive. The result is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message