beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From François Wagner (JIRA) <j...@apache.org>
Subject [jira] [Commented] (BEAM-2429) Conflicting filesystems with used of HadoopFileSystem
Date Mon, 10 Jul 2017 07:59:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079994#comment-16079994
] 

François Wagner commented on BEAM-2429:
---------------------------------------

Hi Manoj,

Here the code that worked for me:

```
String[] args = new String[]{ "--hdfsConfiguration=[{\"fs.defaultFS\" : \"hdfs://host:port\"}]"};
options = PipelineOptionsFactory .fromArgs(args) .withValidation() .as(HadoopFileSystemOptions.class);
Pipeline pipeline = Pipeline.create(options);
```

Cheers,
François

> Conflicting filesystems with used of HadoopFileSystem
> -----------------------------------------------------
>
>                 Key: BEAM-2429
>                 URL: https://issues.apache.org/jira/browse/BEAM-2429
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>    Affects Versions: 2.0.0
>            Reporter: François Wagner
>            Assignee: Flavio Fiszman
>             Fix For: 2.0.0
>
>
> I'm facing issue when trying to use HadoopFileSystem in my pipeline. It looks like HadoopFileSystem
is registring itself under the `file` schema (https://github.com/apache/beam/pull/2777/files#diff-330bd0854dcab6037ef0e52c05d68eb2L79),
hence the following Exception is thrown when trying to register HadoopFileSystem.
> java.lang.IllegalStateException: Scheme: [file] has conflicting filesystems: [org.apache.beam.sdk.io.LocalFileSystem,
org.apache.beam.sdk.io.hdfs.HadoopFileSystem]
> 	at org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:498)
> What is the correct way to handle `hdfs` url out of the box with TextIO & AvroIO
?
> {code:java}
>     String[] args = new String[]{
>         "--hdfsConfiguration=[{\"dfs.client.use.datanode.hostname\": \"true\"}]"};
>     HadoopFileSystemOptions options = PipelineOptionsFactory
>         .fromArgs(args)
>         .withValidation()
>         .as(HadoopFileSystemOptions.class);
>     Pipeline pipeline = Pipeline.create(options); 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message