hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Q Long (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-2087) Dynamic partition insert performance problem
Date Thu, 31 Mar 2011 18:24:05 GMT
Dynamic partition insert performance problem
--------------------------------------------

                 Key: HIVE-2087
                 URL: https://issues.apache.org/jira/browse/HIVE-2087
             Project: Hive
          Issue Type: Bug
          Components: Metastore
    Affects Versions: 0.7.0
         Environment: Amazon EMR, S3
            Reporter: Q Long


Create an external(backed by S3) table T, make it partitioned by column P. Populate table
T so it has large number of partitions (say 100). Execute statement like

insert overwrite table T partition (p) select * from another_table

check hive server log, and it will show that all existing partitions will be read and loaded
before any mapper starts working. This feels excessive, given that the insert statement may
only create or overwrite a very small number of partitions. Is there other reason that insert
using dynamic partition requires loading the whole table?



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message