hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
Date Wed, 08 May 2013 02:09:16 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gunther Hagleitner reopened MAPREDUCE-5038:
-------------------------------------------


Can someone please re-investigate? This is causing hive to fail it's test with "har" filesystem.
Here's the stack trace:

{noformat}
java.io.IOException: URI: har://pfile-:/grid/0/jenkins/workspace/UnitTest-Hive-condor-0.11.0/label/centos5/hdp-BUILDS/hive-0.11.0.1.3.0.0/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/data.har/hr=11/000000_0
is an invalid Har URI since host==null. Expecting har://<scheme>-<host>/<path>.
at org.apache.hadoop.fs.HarFileSystem.decodeHarURI(HarFileSystem.java:191)
at org.apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:100)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1482)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:251)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:270)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:226)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:385)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:351)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:389)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:687)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Job Submission failed with exception 'java.io.IOException(URI: har://pfile-:/grid/0/jenkins/workspace/UnitTest-Hive-condor-0.11.0/label/centos5/hdp-BUILDS/hive-0.11.0.1.3.0.0/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/data.har/hr=11/000000_0
is an invalid Har URI since host==null. Expecting har://<scheme>-<host>/<path>.)'
{noformat}

----

Steps to reproduce:
{noformat}
$ ant test -Dtestcase=TestCliDriver -Dqfile=archive_multi.q
{noformat}

Also, I verified the same test passes when I run with a local build after reverting this patch.

Thanks.
                
> old API CombineFileInputFormat missing fixes that are in new API 
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-5038
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1.1.1
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>             Fix For: 1.3.0
>
>         Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch,
MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch
>
>
> The following changes patched the CombineFileInputFormat in mapreduce, but neglected
the one in mapred
> MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files
> MAPREDUCE-2021 solved returning duplicate hostnames in split locations
> MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS
> In trunk this is not an issue as the one in mapred extends the one in mapreduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message