hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Created] (HIVE-11940) "INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy data from staging directory to target directory
Date Wed, 23 Sep 2015 23:04:04 GMT
Sergio Peña created HIVE-11940:
----------------------------------

             Summary: "INSERT OVERWRITE" query is very slow because it creates one "distcp"
per file to copy data from staging directory to target directory
                 Key: HIVE-11940
                 URL: https://issues.apache.org/jira/browse/HIVE-11940
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.2.1
            Reporter: Sergio Peña
            Assignee: Sergio Peña


When hive.exec.stagingdir is set to ".hive-staging", which will be placed under the target
directory when running "INSERT OVERWRITE" query, Hive will grab all files under the staging
directory and copy them ONE BY ONE to target directory.

When hive exec.stagingdir is set to "/tmp/hive", Hive will simply do a RENAME operation which
will be instant.

This happens with files that are not encrypted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message