incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Smith (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-47) Inputs and outputs can't use non-default Hadoop FileSystem
Date Tue, 14 Aug 2012 22:37:37 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shawn Smith updated CRUNCH-47:
------------------------------

    Attachment: multiple-file-systems.patch

Attached patch replaces all calls to FileSystem.get() with Path.getFileSystem().

Uses FileUtil.copy(src, dst, ..deleteSource=true..) instead of FileSystem.rename() when src
and dest are on different file systems.
                
> Inputs and outputs can't use non-default Hadoop FileSystem
> ----------------------------------------------------------
>
>                 Key: CRUNCH-47
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-47
>             Project: Crunch
>          Issue Type: Bug
>          Components: IO
>    Affects Versions: 0.3.0
>         Environment: Elastic MapReduce Hadoop 1.0.3
>            Reporter: Shawn Smith
>         Attachments: multiple-file-systems.patch
>
>
> I'm getting the following exception trying to use Crunch with Elastic MapReduce where
input and output files use the Native S3 FileSystem and intermediate files use HDFS.  HDFS
is configured as the default file system:
> Exception in thread "main" java.lang.IllegalArgumentException: This file system object
(hdfs://10.114.37.65:9000) does not support access to the request path 's3n://test-bucket/test/Input.avro'
You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf)
to obtain a file system supporting your path.
> 	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:767)
> 	at org.apache.crunch.io.SourceTargetHelper.getPathSize(SourceTargetHelper.java:44)
> It looks like Crunch has a number of calls to FileSystem.get(Configuration) that assume
the default configured file system and fail with an S3 input or output.
> Also, CrunchJob.handleMultiPaths() calls FileSystem.rename() which works only if the
source and destination use the same file system.  This breaks the final upload of the output
files from HDFS to S3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message