spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Arenas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-11995) Partitioning Parquet by DateType
Date Wed, 25 Nov 2015 19:25:11 GMT

    [ https://issues.apache.org/jira/browse/SPARK-11995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027423#comment-15027423
] 

Jack Arenas commented on SPARK-11995:
-------------------------------------

Seems like the issue comes from CatalystSchemaConverter.scala because DateType is only ever
parsed from an INT32 and reading from a partition column may change the type to binary (I'm
guessing) which means adding 

```
case DATE => DateType
```

after line 171 might do the trick. Investigating now.

> Partitioning Parquet by DateType
> --------------------------------
>
>                 Key: SPARK-11995
>                 URL: https://issues.apache.org/jira/browse/SPARK-11995
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 1.5.2
>            Reporter: Jack Arenas
>            Priority: Minor
>
> ... After writing to s3 and partitioning by a DateType column, reads on the parquet "table"
(i.e. s3n://s3_bucket_url/table where date partitions break the table into date-based s3n://s3_bucket_url/table/date=2015-11-25
chunks) will show the partitioned date column as a StringType...
> https://github.com/databricks/spark-redshift/issues/122



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message