hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vihang Karajgaonkar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16299) MSCK REPAIR TABLE should enforce partition key order when adding unknown partitions
Date Fri, 31 Mar 2017 00:36:41 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vihang Karajgaonkar updated HIVE-16299:
---------------------------------------
    Attachment: HIVE-16299.02.patch

Updating the patch with a better implementation. The patch makes changes to the parallel file
listing algorithm so that the directory structure which do not follow the partition key specs
are not searched. This early exit strategy will also help improve query response time on slower
filesystems like S3 and when partition directory structure does not conform to partition definitions.
MSCK will throw exception or log a warning based on the value of {{hive.msck.path.validation}}
configuration.

> MSCK REPAIR TABLE should enforce partition key order when adding unknown partitions
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-16299
>                 URL: https://issues.apache.org/jira/browse/HIVE-16299
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 2.2.0
>            Reporter: Dudu Markovitz
>            Assignee: Vihang Karajgaonkar
>            Priority: Minor
>         Attachments: HIVE-16299.01.patch, HIVE-16299.02.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
> static String getPartitionName(Path tablePath, Path partitionPath, Set<String>
partCols)
> ------------------------------------------------------------------------------------
> MSCK REPAIR validates that any sub-directory is in the format col=val and that there
is indeed a partition column named "col".
> However, there is no validation of the partition column location and as a result false
partitions are being created and so are directories that match those partitions. 
> e.g. 1
> hive> dfs -mkdir -p /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5;
> hive> create external table t (i int) partitioned by (a int,b int,c int) ;
> OK
> hive> msck repair table t;
> OK
> Partitions not in metastore:	t:a=1/a=2/a=3/b=4/c=5
> Repair: Added partition to metastore t:a=1/a=2/a=3/b=4/c=5
> Time taken: 0.563 seconds, Fetched: 2 row(s)
> hive> show partitions t;
> OK
> a=3/b=4/c=5
> hive> dfs -ls -R /user/hive/warehouse/t;
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3/b=4
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=3
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=3/b=4
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=3/b=4/c=5
> e.g. 2
> hive> dfs -mkdir -p /user/hive/warehouse/t/c=3/b=2/a=1;
> hive> create external table t (i int) partitioned by (a int,b int,c int);
> OK
> hive> msck repair table t;
> OK
> Partitions not in metastore:	t:c=3/b=2/a=1
> Repair: Added partition to metastore t:c=3/b=2/a=1
> Time taken: 0.512 seconds, Fetched: 2 row(s)
> hive> show partitions t;
> OK
> a=1/b=2/c=3
> hive> dfs -ls -R  /user/hive/warehouse/t;
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 /user/hive/warehouse/t/a=1
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 /user/hive/warehouse/t/a=1/b=2
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 /user/hive/warehouse/t/a=1/b=2/c=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 /user/hive/warehouse/t/c=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 /user/hive/warehouse/t/c=3/b=2
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 /user/hive/warehouse/t/c=3/b=2/a=1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message