hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kaushik srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17824) msck repair table should drop the missing partitions from metastore
Date Fri, 27 Oct 2017 07:17:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221845#comment-16221845
] 

kaushik srinivas commented on HIVE-17824:
-----------------------------------------

Facing below issues with respect to same scenario.

Steps done:
1. Create external hive table on hdfs with partitions every hour.
2. hdfs folders were deleted of one of the added partitions manually.
3,  performed msck repair on the table.

Issues observed:
1. Did not drop the partitions with missing hdfs folders.
2. Queried table with existing partitions filter in hive and spark sql.
    a. Hive: 
        select * from xxx where datetime=xxx
        log output from hive console :
        "Oct 27, 2017 2:26:35 AM INFO: parquet.filter2.compat.FilterCompat: Filtering using
predicate: eq(datetime, 2017102605)"
        even though count(*) was fetching the no of record in the partition given in the query.
     
     b. Spark Sql:
         select * from xxx where datetime=xxx
         stderr : 
          java.io.FileNotFoundException: File does not exist: hdfs://xxx/xxx/datetime=2017101215

        even though we were querying the partition which existed.
        It gave the exception for the partition whose hdfs folder was deleted.


> msck repair table should drop the missing partitions from metastore
> -------------------------------------------------------------------
>
>                 Key: HIVE-17824
>                 URL: https://issues.apache.org/jira/browse/HIVE-17824
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>
> {{msck repair table <tablename>}} is often used in environments where the new partitions
are loaded as directories on HDFS or S3 and users want to create the missing partitions in
bulk. However, currently it only supports addition of missing partitions. If there are any
partitions which are present in metastore but not on the FileSystem, it should also delete
them so that it truly repairs the table metadata.
> We should be careful not to break backwards compatibility so we should either introduce
a new config or keyword to add support to delete unnecessary partitions from the metastore.
This way users who want the old behavior can easily turn it off. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message