hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1149) Optimize CombineHiveFileInputFormat execution speed
Date Wed, 10 Feb 2010 19:06:32 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao updated HIVE-1149:
-----------------------------

      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

> Optimize CombineHiveFileInputFormat execution speed
> ---------------------------------------------------
>
>                 Key: HIVE-1149
>                 URL: https://issues.apache.org/jira/browse/HIVE-1149
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Priority: Minor
>
> When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty
slow.
> One of the culprit is the "new URI" call in the following function. We should try to
get rid of it.
> {code}
>   protected static PartitionDesc getPartitionDescFromPath(
>       Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException
{
>     // The format of the keys in pathToPartitionInfo sometimes contains a port
>     // and sometimes doesn't, so we just compare paths.
>     for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
>         .entrySet()) {
>       try {
>         if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
>           return entry.getValue();
>         }
>       } catch (URISyntaxException e2) {
>       }
>     }
>     throw new IOException("cannot find dir = " + dir.toString()
>         + " in partToPartitionInfo!");
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message