hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5805) problem using top level s3 buckets as input/output directories
Date Thu, 21 May 2009 15:06:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711643#action_12711643

Tom White commented on HADOOP-5805:

This looks like a good fix. The test should do an assert to check that it gets back an appropriate
FileStatus object.

The patch needs to be regenerated since the tests have moved from src/test to src/test/core.

For the second problem, you could subclass your output format to override checkOutputSpecs()
so it doesn't throw FileAlreadyExistsException. But I agree it would be nicer to deal with
this generally. Perhaps open a separate Jira as it would affect more than NativeS3FileSystem.

> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>                 Key: HADOOP-5805
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5805
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 0.18.3
>         Environment: ec2, cloudera AMI, 20 nodes
>            Reporter: Arun Jacob
>             Fix For: 0.21.0
>         Attachments: HADOOP-5805-0.patch
> When I specify top level s3 buckets as input or output directories, I get the following
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
>         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
>         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
>         at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
>         at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir  s3n://infocloud-output/output-subdir

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message