hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subramanyam Pattipaka (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases
Date Mon, 15 Aug 2016 22:08:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421763#comment-15421763
] 

Subramanyam Pattipaka edited comment on HIVE-14511 at 8/15/16 10:07 PM:
------------------------------------------------------------------------

[~sershe], Even if we introduce another command to be flexible to cater this scenario, what
if the user data has changed in terms of directory structure. Why does the user has to recreate
all tables again? Why not repair table is also flexible (with this patch) such that configs
mapred.input.dir.recursive and hive.mapred.supports.subdirectories are supported add relevant
partitions. Further having two commands may be confusing. 

I don't mean to add file here  a=1/000000_0 f. I mean only to ignore these and list them in
error log if a config is enabled such that users can act on them. Error is better instead
of debug. This way, all configurations would give these details. For example if we have following
files

tbldir/a=1/file1.txt
tbldir/a=2/b=1/file2.txt
tbldir/a=2/b=1/c=1/file3.txt

and we are trying to create partitioned table with partitions on a and b with root directory
tbldir 

Here ERROR log would say ignoring file tbldir/a=1/file1.txt due to incorrect structure if
ignore config is set. Otherwise, operation is failed.

We add only one partition with values (2, 1).

msck is still restrict and the ask here is to support configs mapred.input.dir.recursive and
hive.mapred.supports.subdirectories.



was (Author: pattipaka):
[~sershe], Even if we introduce another command to be flexible to cater this scenario, what
if the user data has changed in terms of directory structure. Why does the user has to recreate
all tables again? Why not repair table is also flexible (with this patch) such that configs
mapred.input.dir.recursive and hive.mapred.supports.subdirectories are supported add relevant
partitions. Further having two commands may be confusing. 

I don't mean to add file here  a=1/000000_0 f. I mean only to ignore these and list them in
error log if a config is enabled such that users can act on them. Error is better instead
of debug. This way, all configurations would give these details. For example if we have following
files

tbldir/a=1/file1.txt
tbldir/a=2/b=1/file2.txt

and we are trying to create partitioned table with partitions on a and b with root directory
tbldir 

Here ERROR log would say ignoring file tbldir/a=1/file1.txt due to incorrect structure if
ignore config is set. Otherwise, operation is failed.

We add only one partition with values (2, 1).

msck is still restrict and the ask here is to support configs mapred.input.dir.recursive and
hive.mapred.supports.subdirectories.


> Improve MSCK for partitioned table to deal with special cases
> -------------------------------------------------------------
>
>                 Key: HIVE-14511
>                 URL: https://issues.apache.org/jira/browse/HIVE-14511
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-14511.01.patch
>
>
> Some users will have a folder rather than a file under the last partition folder. However,
msck is going to search for the leaf folder rather than the last partition folder. We need
to improve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message