beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Cwik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2277) IllegalArgumentException when using Hadoop file system for WordCount example.
Date Fri, 12 May 2017 16:48:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008367#comment-16008367
] 

Luke Cwik commented on BEAM-2277:
---------------------------------

The issue with the design is that internal portions of FileBasedSource/FileBasedSink expect
that string equality and sorting is equivalent to the "ResourceId" equality and sorting which
is incorrect since URIs file:/my/pAth and file:///my/pAth and file:///my/p%41th are all the
same yet string versions would not encompass this. This would require our code that uses these
strings to go through a URI normalizer.

> IllegalArgumentException when using Hadoop file system for WordCount example.
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-2277
>                 URL: https://issues.apache.org/jira/browse/BEAM-2277
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>            Reporter: Aviem Zur
>            Assignee: Aviem Zur
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> IllegalArgumentException when using Hadoop file system for WordCount example.
> Occurred when running WordCount example using Spark runner on a YARN cluster.
> Command-line arguments:
> {code:none}
> --runner=SparkRunner --inputFile=hdfs:///user/myuser/kinglear.txt --output=hdfs:///user/myuser/wc/wc
> {code}
> Stack trace:
> {code:none}
> java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds have the
same scheme, but received file, hdfs.
> 	at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
> 	at org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:394)
> 	at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:236)
> 	at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:626)
> 	at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:516)
> 	at org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:592)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message