spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21687) Spark SQL should set createTime for Hive partition
Date Sat, 12 Aug 2017 08:28:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated SPARK-21687:
------------------------------
    Target Version/s:   (was: 2.3.0)
            Priority: Minor  (was: Major)
       Fix Version/s:     (was: 2.3.0)
          Issue Type: Improvement  (was: Bug)

> Spark SQL should set createTime for Hive partition
> --------------------------------------------------
>
>                 Key: SPARK-21687
>                 URL: https://issues.apache.org/jira/browse/SPARK-21687
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Chaozhong Yang
>            Priority: Minor
>
> In Spark SQL, we often use `insert overwite table t partition(p=xx)` to create partition
for partitioned table. `createTime` is an important information to manage data lifecycle,
e.g TTL.
> However, we found that Spark SQL doesn't call setCreateTime in `HiveClientImpl#toHivePartition`
as follows:
> {code:scala}
> def toHivePartition(
>       p: CatalogTablePartition,
>       ht: HiveTable): HivePartition = {
>     val tpart = new org.apache.hadoop.hive.metastore.api.Partition
>     val partValues = ht.getPartCols.asScala.map { hc =>
>       p.spec.get(hc.getName).getOrElse {
>         throw new IllegalArgumentException(
>           s"Partition spec is missing a value for column '${hc.getName}': ${p.spec}")
>       }
>     }
>     val storageDesc = new StorageDescriptor
>     val serdeInfo = new SerDeInfo
>     p.storage.locationUri.map(CatalogUtils.URIToString(_)).foreach(storageDesc.setLocation)
>     p.storage.inputFormat.foreach(storageDesc.setInputFormat)
>     p.storage.outputFormat.foreach(storageDesc.setOutputFormat)
>     p.storage.serde.foreach(serdeInfo.setSerializationLib)
>     serdeInfo.setParameters(p.storage.properties.asJava)
>     storageDesc.setSerdeInfo(serdeInfo)
>     tpart.setDbName(ht.getDbName)
>     tpart.setTableName(ht.getTableName)
>     tpart.setValues(partValues.asJava)
>     tpart.setSd(storageDesc)
>     new HivePartition(ht, tpart)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message