hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad Chakka (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-493) automatically infer existing partitions of table from HDFS files.
Date Mon, 12 Oct 2009 04:36:32 GMT

    [ https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764542#action_12764542
] 

Prasad Chakka commented on HIVE-493:
------------------------------------

Cyrus, 

Thanks for providing this patch. Very useful.

It is possible that on an HDFS with permissions enabled, a partition/table directory is not
accessible to the current user but metadata will be deleted here so I am little uncomfortable
in removing partitions. I am not really sure that there is that much utility for removing
partitions compared to the risk loosing partitions permanently. What do you think? 

Couple of comments on the code:
1) Can you add a test or two to the msck test package.
2) REPAIR should be an optional keyword to the MSCK ANTRL clause instead of being whole another
clause. Look at how KW_EXTERNAL is used in createStatement clause.
3) Following like should be outside of the for loop since there is only one table here.
{code}
Table table = db.getTable(MetaStoreUtils.DEFAULT_DATABASE_NAME,
                msckDesc.getTableName());
{code}
4) Is this cast '(Map <String, String>)' really needed?



> automatically infer existing partitions of table from HDFS files.
> -----------------------------------------------------------------
>
>                 Key: HIVE-493
>                 URL: https://issues.apache.org/jira/browse/HIVE-493
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>         Attachments: HIVE-493.patch
>
>
> Initially partition list for a table is inferred from HDFS directory structure instead
of looking into metastore (partitions are created using 'alter table ... add partition').
but this automatic inferring was removed to favor the later approach during checking-in metastore
checker feature and also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the HDFS directory
and let Hive infer rather than explicitly add a partition. But doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition list is merged
list of inferred partitions and registered partitions. and duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the inferred partitions?
the table schema when the inferred partition is created or the latest tale schema? how do
we know the table schema when the inferred partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without actually
deleting the data. this feature is not supported and may not be that useful but nevertheless
this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes for such
partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of Hive. If
inferred partitions are preferred then can we live with restricted functionality that this
imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message