crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Smith (JIRA)" <>
Subject [jira] [Created] (CRUNCH-47) Inputs and outputs can't use non-default Hadoop FileSystem
Date Tue, 14 Aug 2012 22:29:38 GMT
Shawn Smith created CRUNCH-47:

             Summary: Inputs and outputs can't use non-default Hadoop FileSystem
                 Key: CRUNCH-47
             Project: Crunch
          Issue Type: Bug
          Components: IO
    Affects Versions: 0.3.0
         Environment: Elastic MapReduce Hadoop 1.0.3
            Reporter: Shawn Smith

I'm getting the following exception trying to use Crunch with Elastic MapReduce where input
and output files use the Native S3 FileSystem and intermediate files use HDFS.  HDFS is configured
as the default file system:

Exception in thread "main" java.lang.IllegalArgumentException: This file system object (hdfs://
does not support access to the request path 's3n://test-bucket/test/Input.avro' You possibly
called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain
a file system supporting your path.
	at org.apache.hadoop.fs.FileSystem.checkPath(
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
	at org.apache.hadoop.fs.FileSystem.exists(

It looks like Crunch has a number of calls to FileSystem.get(Configuration) that assume the
default configured file system and fail with an S3 input or output.

Also, CrunchJob.handleMultiPaths() calls FileSystem.rename() which works only if the source
and destination use the same file system.  This breaks the final upload of the output files
from HDFS to S3.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message