hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <>
Subject [jira] Commented: (HIVE-1488) CombineHiveInputFormat for hadoop-19 is broken
Date Tue, 27 Jul 2010 01:10:17 GMT


Ning Zhang commented on HIVE-1488:

Both MultiFileInputFormat and CombineFileInputFormat (on which CombineHiveInputFormat is based)
can combine multiple files into one split. In addition to the differences in locality, CFIF
also provide the interface to define pools and implement filters so that you can define which
files should be/should not be combined into one split. In CHIF, the logics of putting multiple
files in one directory (but not in different directories) in one split is implemented in CombineHiveInputFormat.CombineFilter.

It seems the MFIF support in hadoop 0.19 was added based on some external use cases in the
hive-user mailing list (
I'm not sure whether anyone is still actively using it though. 

> CombineHiveInputFormat for hadoop-19 is broken
> ----------------------------------------------
>                 Key: HIVE-1488
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Ning Zhang
> I don't if anyone is using it. After making some recent testing related changes in HIVE-1408,
combine[12].q are no longer working when testing against 19. I have seen them fail earlier
as well and not investigated. Looking at the code, it seems pretty hokey:
> getInputPathsShim():
>       Path[] newPaths = new Path[paths.length];
>       // remove file:                                                               
>       for (int pos = 0; pos < paths.length; pos++) {
>         newPaths[pos] = new Path(paths[pos].toString().substring(5));
>       }
> since we are no longer using 'file:' namespace for test warehouse, this is broke. But
this would be broken against any hdfs instance it would seem(?). Also not clear what we are
trying to do here.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message