Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A907108DC for ; Mon, 17 Jun 2013 11:05:23 +0000 (UTC) Received: (qmail 64649 invoked by uid 500); 17 Jun 2013 11:05:23 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 64554 invoked by uid 500); 17 Jun 2013 11:05:21 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 64283 invoked by uid 500); 17 Jun 2013 11:05:20 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 64276 invoked by uid 99); 17 Jun 2013 11:05:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Jun 2013 11:05:20 +0000 Date: Mon, 17 Jun 2013 11:05:20 +0000 (UTC) From: "Deepak Subhramanian (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CRUNCH-220) Crunch not working with S3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Subhramanian updated CRUNCH-220: --------------------------------------- Summary: Crunch not working with S3 (was: Crunch) > Crunch not working with S3 > -------------------------- > > Key: CRUNCH-220 > URL: https://issues.apache.org/jira/browse/CRUNCH-220 > Project: Crunch > Issue Type: Bug > Components: IO > Affects Versions: 0.6.0 > Environment: Cloudera Hadoop with Amazon S3 > Reporter: Deepak Subhramanian > Priority: Minor > > I am trying to use crunch to read file from S3 and write to S3. I am able to read the file .But giving an error while writing to s3. Not sure if it is a bug or I am missing a hadoop configuration. I am able to read from s3 and write to a local file or hdfs directly. Here is the code and error. I am passing s3 key and secret as parameters. > PCollection lines =pipeline.read(From.sequenceFile(inputdir, Writables.strings())); > > PCollection textline = lines.parallelDo(new DoFn() { > public void process(String line, Emitter emitter) { > if (headerNotWritten) { > > //emitter.emit("Writing Header"); > emitter.emit(table_header.getTable_header()); > emitter.emit(line); > headerNotWritten =false; > > }else { > emitter.emit(line); > } > } > }, Writables.strings()); // Indicates the serialization format > > pipeline.writeTextFile(textline, outputdir); > Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: s3n://bktname/testcsv, expected: hdfs://ip-address.compute.internal > [ip-addresscompute.amazonaws.com] out: at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:410) > [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106) > [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162) > [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558) > [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:797) > [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.io.impl.FileTargetImpl.handleExisting(FileTargetImpl.java:133) > [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:212) > [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:200) > [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.impl.mr.collect.PCollectionImpl.write(PCollectionImpl.java:132) > [ec2-79-125-102-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.impl.mr.MRPipeline.writeTextFile(MRPipeline.java:356) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira