crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-220) Crunch not working with S3
Date Tue, 18 Jun 2013 21:14:20 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687206#comment-13687206
] 

Josh Wills commented on CRUNCH-220:
-----------------------------------

Hey Deepak; I can't seem to replicate that. You don't have any edits to the Avro version in
the POM, right?
                
> Crunch not working with S3
> --------------------------
>
>                 Key: CRUNCH-220
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-220
>             Project: Crunch
>          Issue Type: Bug
>          Components: IO
>    Affects Versions: 0.6.0
>         Environment: Cloudera Hadoop with Amazon S3
>            Reporter: Deepak Subhramanian
>            Assignee: Josh Wills
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: CRUNCH-220.patch
>
>
> I am trying to use crunch to read file from S3 and write to S3. I am able to read the
file .But giving an error while writing to s3.  Not sure if it is a bug or I am missing a
hadoop configuration.  I am able to read from s3 and write to a local file or hdfs directly.
 Here is the code and error. I am passing s3 key and secret as parameters.  
> PCollection<String> lines =pipeline.read(From.sequenceFile(inputdir,   Writables.strings()));
>     
>     PCollection<String> textline = lines.parallelDo(new DoFn<String, String>()
{
>         public void process(String line, Emitter<String> emitter) {
>             if (headerNotWritten) {
>               
>                 //emitter.emit("Writing Header");
>                 emitter.emit(table_header.getTable_header());
>                 emitter.emit(line);
>                 headerNotWritten =false;
>                 
>             }else {
>             emitter.emit(line);
>             }
>         }
>       }, Writables.strings()); // Indicates the serialization format
>     
>     pipeline.writeTextFile(textline, outputdir);
>  Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: s3n://bktname/testcsv,
expected: hdfs://ip-address.compute.internal
> [ip-addresscompute.amazonaws.com] out: 	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:410)
> [ip-address-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
> [ip-address-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
> [ip-address-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
> [ip-address-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:797)
> [ip-address-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.crunch.io.impl.FileTargetImpl.handleExisting(FileTargetImpl.java:133)
> [ip-address-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:212)
> [ip-address-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:200)
> [ip-address-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.crunch.impl.mr.collect.PCollectionImpl.write(PCollectionImpl.java:132)
> [ec2-79-125-102-82.eu-west-1.compute.amazonaws.com] out: 	at org.apache.crunch.impl.mr.MRPipeline.writeTextFile(MRPipeline.java:356)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message