hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1488) CombineHiveInputFormat for hadoop-19 is broken
Date Mon, 26 Jul 2010 19:30:19 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892426#action_12892426
] 

Ning Zhang commented on HIVE-1488:
----------------------------------

MultiFileInputFormat in Hadoop 0.19 was introduced in HIVE-1121 to implement CombineFileInputFormat-like
functionality in Hadoop pre-20 version.  However, MultiFileInputFormat actually does not support
pooling (creating different splits for files in different directories). So it is impossible
to use CombineHiveInputFormat to query multiple partitions (each partition has to be in a
different pool in the case of CombineFileInputFormat). 

Due to this limitation and there is no easy way to fix this in Hive, I think we should disable
CombineHiveInputFormat in pre-0.20 in strict mode and give a warning to users in nonstrict
mode. 

For unit testing, we can exclude combine2.q (which test CombineHiveInputFormat across partitions)
from Hadoop 0.19.

Any thoughts?

> CombineHiveInputFormat for hadoop-19 is broken
> ----------------------------------------------
>
>                 Key: HIVE-1488
>                 URL: https://issues.apache.org/jira/browse/HIVE-1488
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Ning Zhang
>
> I don't if anyone is using it. After making some recent testing related changes in HIVE-1408,
combine[12].q are no longer working when testing against 19. I have seen them fail earlier
as well and not investigated. Looking at the code, it seems pretty hokey:
> getInputPathsShim():
>       Path[] newPaths = new Path[paths.length];
>       // remove file:                                                               
                                                                          
>       for (int pos = 0; pos < paths.length; pos++) {
>         newPaths[pos] = new Path(paths[pos].toString().substring(5));
>       }
> since we are no longer using 'file:' namespace for test warehouse, this is broke. But
this would be broken against any hdfs instance it would seem(?). Also not clear what we are
trying to do here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message