hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Syed Shameerur Rahman (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-23347) MSCK REPAIR cannot discover partitions with upper case directory names.
Date Mon, 04 May 2020 12:53:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098894#comment-17098894
] 

Syed Shameerur Rahman commented on HIVE-23347:
----------------------------------------------

[~adeshrao]
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L386

We remove all the known partitions path (fetched from metastore) from the partitions path
listed from the fileSystem. Since partition path fetched from metastore will always have lower
case partition column name and the partition path listed from the fileSystem might have upper
case column we might end up not removing the already present partition path. 
Eg:
partition from metastore: <tablepath>/year=2020/month=3/day=2;
partition from fileSystem: <tablepath>/Year=2020/Month=3/Day=2;
Both these paths should be considered same and hence removed from *allPartDirs*. I guess HIVE-23347.3.patch
doesn't handle that case.
So i guess it is better to tackle this issue at place where the partition paths are fetched
from fileSystem.

> MSCK REPAIR cannot discover partitions with upper case directory names.
> -----------------------------------------------------------------------
>
>                 Key: HIVE-23347
>                 URL: https://issues.apache.org/jira/browse/HIVE-23347
>             Project: Hive
>          Issue Type: Bug
>          Components: Standalone Metastore
>    Affects Versions: 3.1.0
>            Reporter: Sankar Hariappan
>            Assignee: Adesh Kumar Rao
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HIVE-23347.01.patch, HIVE-23347.2.patch, HIVE-23347.3.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> For the following scenario, we expect MSCK REPAIR to discover partitions but it couldn't.
> 1. Have partitioned data path as follows.
> hdfs://mycluster/datapath/t1/Year=2020/Month=03/Day=10
> hdfs://mycluster/datapath/t1/Year=2020/Month=03/Day=11
> 2. create external table t1 (key int, value string) partitioned by (Year int, Month int,
Day int) stored as orc location hdfs://mycluster/datapath/t1'';
> 3. msck repair table t1;
> 4. show partitions t1; --> Returns zero partitions
> 5. select * from t1; --> Returns empty data.
> When the partition directory names are changed to lower case, this works fine.
> hdfs://mycluster/datapath/t1/year=2020/month=03/day=10
> hdfs://mycluster/datapath/t1/year=2020/month=03/day=11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message