hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1001) CombinedHiveInputFormat should parse the inputpath correctly
Date Mon, 28 Dec 2009 23:52:29 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794968#action_12794968
] 

He Yongqiang commented on HIVE-1001:
------------------------------------

Hi Dave,

How many mappers in the map-reduce job before applying this patch?  It is strange that the
merge job was not running. The merge job is started according to average result file size
produced by the first job, and the merge job always uses a map-reduce job.

And also discussed with Namit, we may need to add another parameter "number of files" to determine
whether to start the merge job or not (this will be a different issue).

> CombinedHiveInputFormat should parse the inputpath correctly
> ------------------------------------------------------------
>
>                 Key: HIVE-1001
>                 URL: https://issues.apache.org/jira/browse/HIVE-1001
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>             Fix For: 0.5.0
>
>         Attachments: hive.1001.1.patch
>
>
> From David Lerman:
> "
> I'm running into errors where CombinedHiveInputFormat is combining data from
> two different tables which is causing problems because the tables have
> different input formats.
> It looks like the problem is in
> org.apache.hadoop.hive.shims.Hadoop20Shims.getInputPathsShim.  It calls
> CombineFileInputFormat.getInputPaths which returns the list of input paths
> and then chops off the first 5 characters to remove file: from the
> beginning, but the return value I'm getting from getInputPaths is actually
> hdfs://domain/path.  So then when it creates the pools using these paths,
> none of the input paths match the pools (since they're just the file path
> which protocol or domain).
> "
> We should use Path.getPath() to get the path part of an URI instead of just chopping
off 5 chars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message