hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1149) Optimize CombineHiveFileInputFormat execution speed
Date Wed, 10 Feb 2010 19:04:32 GMT
Optimize CombineHiveFileInputFormat execution speed

                 Key: HIVE-1149
                 URL: https://issues.apache.org/jira/browse/HIVE-1149
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Zheng Shao

When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty slow.
One of the culprit is the "new URI" call in the following function. We should try to get rid
of it.

  protected static PartitionDesc getPartitionDescFromPath(
      Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException {
    // The format of the keys in pathToPartitionInfo sometimes contains a port
    // and sometimes doesn't, so we just compare paths.
    for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
        .entrySet()) {
      try {
        if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
          return entry.getValue();
      } catch (URISyntaxException e2) {
    throw new IOException("cannot find dir = " + dir.toString()
        + " in partToPartitionInfo!");

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message