hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Q Long (JIRA)" <>
Subject [jira] [Created] (HIVE-2087) Dynamic partition insert performance problem
Date Thu, 31 Mar 2011 18:24:05 GMT
Dynamic partition insert performance problem

                 Key: HIVE-2087
             Project: Hive
          Issue Type: Bug
          Components: Metastore
    Affects Versions: 0.7.0
         Environment: Amazon EMR, S3
            Reporter: Q Long

Create an external(backed by S3) table T, make it partitioned by column P. Populate table
T so it has large number of partitions (say 100). Execute statement like

insert overwrite table T partition (p) select * from another_table

check hive server log, and it will show that all existing partitions will be read and loaded
before any mapper starts working. This feels excessive, given that the insert statement may
only create or overwrite a very small number of partitions. Is there other reason that insert
using dynamic partition requires loading the whole table?

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message