hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Lerman (JIRA)" <>
Subject [jira] Updated: (HIVE-1006) getPartitionDescFromPath failing from CombineHiveInputFormat
Date Tue, 22 Dec 2009 16:40:29 GMT


Dave Lerman updated HIVE-1006:

    Attachment: hive.1006.2.patch

Sorry about that - upload the wrong patch for this and 1007.

> getPartitionDescFromPath failing from CombineHiveInputFormat
> ------------------------------------------------------------
>                 Key: HIVE-1006
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.4.1
>            Reporter: Dave Lerman
>         Attachments: hive.1006.1.patch, hive.1006.2.patch
> When HiveInputFormat.getPartitionDescFromPath is called from CombineHiveInputFormat,
it sometimes fails to return a matching partitionDesc which then causes an Exception down
the line since the split doesn't have an inputFormatClassName.
> The issue is that the path format used as the key in pathToPartitionInfo varies between
stage - in the first stage it's the complete path as returned from the table definitions (eg.
hdfs://server/path), and then in subsequent stages, it's the complete path with port (eg.
hdfs://server:8020/path) of the result of the previous stage.  This isn't a problem in HiveInputFormat
since the directory you're looking up always uses the same format as the keys, but in CombineHiveInputFormat,
we take that path and look up its children in the file system to get all the block information,
and then use one of the returned paths to get the partition info -- and that returned path
does not include the port.  So, in any stage after the first, we are looking for a path without
the port, but all the keys in the map contain a port, so we don't find a match.
> The attached patch may not be ideal -- it doesn't fix the underlying problem of inconsistent
path formats in pathToPartitionInfo -- it just works around it by walking through the map
and looking for a matching path rather than doing a hash lookup.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message