hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Lerman (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1006) getPartitionDescFromPath failing from CombineHiveInputFormat
Date Tue, 22 Dec 2009 15:12:29 GMT
getPartitionDescFromPath failing from CombineHiveInputFormat
------------------------------------------------------------

                 Key: HIVE-1006
                 URL: https://issues.apache.org/jira/browse/HIVE-1006
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.4.1
            Reporter: Dave Lerman


When HiveInputFormat.getPartitionDescFromPath is called from CombineHiveInputFormat, it sometimes
fails to return a matching partitionDesc which then causes an Exception down the line since
the split doesn't have an inputFormatClassName.

The issue is that the path format used as the key in pathToPartitionInfo varies between stage
- in the first stage it's the complete path as returned from the table definitions (eg. hdfs://server/path),
and then in subsequent stages, it's the complete path with port (eg. hdfs://server:8020/path)
of the result of the previous stage.  This isn't a problem in HiveInputFormat since the directory
you're looking up always uses the same format as the keys, but in CombineHiveInputFormat,
we take that path and look up its children in the file system to get all the block information,
and then use one of the returned paths to get the partition info -- and that returned path
does not include the port.  So, in any stage after the first, we are looking for a path without
the port, but all the keys in the map contain a port, so we don't find a match.

The attached patch may not be ideal -- it doesn't fix the underlying problem of inconsistent
path formats in pathToPartitionInfo -- it just works around it by walking through the map
and looking for a matching path rather than doing a hash lookup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message