hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grisha Trubetskoy (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-2408) Perpetually degrading performance in checkPaths
Date Thu, 25 Aug 2011 14:06:28 GMT
Perpetually degrading performance in checkPaths
-----------------------------------------------

                 Key: HIVE-2408
                 URL: https://issues.apache.org/jira/browse/HIVE-2408
             Project: Hive
          Issue Type: Bug
          Components: HBase Handler
    Affects Versions: 0.7.1, 0.8.0
            Reporter: Grisha Trubetskoy


In ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, checkPaths() tacks on a copy_N
if a file exists, working its way up until an available file name is found. The problem is
that the exists() check is quite expensive in HDFS, and if you have hundreds of files to go
through this becomes a serious bottleneck.

A better solution would be to use a timestamp in the file name, then followed by the "copy_N
scheme". 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message