Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A248BD804 for ; Tue, 14 Aug 2012 22:29:38 +0000 (UTC) Received: (qmail 41871 invoked by uid 500); 14 Aug 2012 22:29:38 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 41828 invoked by uid 500); 14 Aug 2012 22:29:38 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 41671 invoked by uid 99); 14 Aug 2012 22:29:38 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Aug 2012 22:29:38 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 0EBF62C5BE6 for ; Tue, 14 Aug 2012 22:29:38 +0000 (UTC) Date: Wed, 15 Aug 2012 09:29:38 +1100 (NCT) From: "Shawn Smith (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: <432336411.10207.1344983378061.JavaMail.jiratomcat@arcas> Subject: [jira] [Created] (CRUNCH-47) Inputs and outputs can't use non-default Hadoop FileSystem MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Shawn Smith created CRUNCH-47: --------------------------------- Summary: Inputs and outputs can't use non-default Hadoop FileSystem Key: CRUNCH-47 URL: https://issues.apache.org/jira/browse/CRUNCH-47 Project: Crunch Issue Type: Bug Components: IO Affects Versions: 0.3.0 Environment: Elastic MapReduce Hadoop 1.0.3 Reporter: Shawn Smith I'm getting the following exception trying to use Crunch with Elastic MapReduce where input and output files use the Native S3 FileSystem and intermediate files use HDFS. HDFS is configured as the default file system: Exception in thread "main" java.lang.IllegalArgumentException: This file system object (hdfs://10.114.37.65:9000) does not support access to the request path 's3n://test-bucket/test/Input.avro' You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path. at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:767) at org.apache.crunch.io.SourceTargetHelper.getPathSize(SourceTargetHelper.java:44) It looks like Crunch has a number of calls to FileSystem.get(Configuration) that assume the default configured file system and fail with an S3 input or output. Also, CrunchJob.handleMultiPaths() calls FileSystem.rename() which works only if the source and destination use the same file system. This breaks the final upload of the output files from HDFS to S3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira