hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad Chakka (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-493) automatically infer existing partitions of table from HDFS files.
Date Mon, 18 May 2009 01:11:45 GMT
automatically infer existing partitions of table from HDFS files.
-----------------------------------------------------------------

                 Key: HIVE-493
                 URL: https://issues.apache.org/jira/browse/HIVE-493
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Metastore, Query Processor
    Affects Versions: 0.3.0, 0.3.1, 0.4.0
            Reporter: Prasad Chakka


Initially partition list for a table is inferred from HDFS directory structure instead of
looking into metastore where partitions are created using 'alter table ... add partition'.
but this was removed to favor the metadata lookup during metastore checker and also to facilitate
external partitions.

Joydeep and Frederick mentioned that it would simple for users to create the HDFS directory
and let Hive infer rather than explicitly add a partition. But doing that raises following...

1) External partitions -- so we have to mix both approaches and partition list is merged list
of inferred partitions and registered partitions. and duplicates have to be resolved.
2) Partition level schemas can't supported. Which schema to chose for the inferred partitions?
the table schema when the inferred partition is created or the latest tale schema? how do
we know the table schema when the inferred partitions is created?
3) If partitions have to be registered the partitions can be disabled without actually deleting
the data. this feature is not supported and may not be that useful but nevertheless this can't
be supported with inferred partitions
4) Indexes are being added. So if partitions are not registered then indexes for such partitions
can not be maintained automatically.

I would like to know what is the general thinking about this among users of Hive. If inferred
partitions are preferred then can we live with restricted functionality that this imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message