hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Lerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1001) CombinedHiveInputFormat should parse the inputpath correctly
Date Fri, 01 Jan 2010 20:03:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795788#action_12795788

Dave Lerman commented on HIVE-1001:

Okay, that makes sense then.  Without the patch, the two input pools have corrupt paths so
the input files don't match either pool and get processed together in one pool of non-matching
paths.  This yields one split and one mapper, so the merge step doesn't run (since there's
only one output file).

With the patch, the pools get created correctly, so the two files are processed in separate
pools, which yields two splits and two mappers, so the merge step runs.

Thanks for the help.

> CombinedHiveInputFormat should parse the inputpath correctly
> ------------------------------------------------------------
>                 Key: HIVE-1001
>                 URL: https://issues.apache.org/jira/browse/HIVE-1001
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>             Fix For: 0.5.0
>         Attachments: hive.1001.1.patch
> From David Lerman:
> "
> I'm running into errors where CombinedHiveInputFormat is combining data from
> two different tables which is causing problems because the tables have
> different input formats.
> It looks like the problem is in
> org.apache.hadoop.hive.shims.Hadoop20Shims.getInputPathsShim.  It calls
> CombineFileInputFormat.getInputPaths which returns the list of input paths
> and then chops off the first 5 characters to remove file: from the
> beginning, but the return value I'm getting from getInputPaths is actually
> hdfs://domain/path.  So then when it creates the pools using these paths,
> none of the input paths match the pools (since they're just the file path
> which protocol or domain).
> "
> We should use Path.getPath() to get the path part of an URI instead of just chopping
off 5 chars.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message