hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1149) Optimize CombineHiveFileInputFormat execution speed
Date Wed, 10 Feb 2010 19:04:32 GMT
Optimize CombineHiveFileInputFormat execution speed
---------------------------------------------------

                 Key: HIVE-1149
                 URL: https://issues.apache.org/jira/browse/HIVE-1149
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Zheng Shao


When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty slow.
One of the culprit is the "new URI" call in the following function. We should try to get rid
of it.

{code}
  protected static PartitionDesc getPartitionDescFromPath(
      Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException {
    // The format of the keys in pathToPartitionInfo sometimes contains a port
    // and sometimes doesn't, so we just compare paths.
    for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
        .entrySet()) {
      try {
        if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
          return entry.getValue();
        }
      } catch (URISyntaxException e2) {
      }
    }
    throw new IOException("cannot find dir = " + dir.toString()
        + " in partToPartitionInfo!");
  }
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message