beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From François Wagner (JIRA) <j...@apache.org>
Subject [jira] [Created] (BEAM-2429) Conflicting filesystems with used of HadoopFileSystem
Date Fri, 09 Jun 2017 09:11:18 GMT
François Wagner created BEAM-2429:
-------------------------------------

             Summary: Conflicting filesystems with used of HadoopFileSystem
                 Key: BEAM-2429
                 URL: https://issues.apache.org/jira/browse/BEAM-2429
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-extensions
    Affects Versions: 2.0.0
            Reporter: François Wagner
            Assignee: Davor Bonaci


I'm facing issue when trying to use HadoopFileSystem in my pipeline. It looks like HadoopFileSystem
is registring itself under the `file` schema (https://github.com/apache/beam/pull/2777/files#diff-330bd0854dcab6037ef0e52c05d68eb2L79),
hence the following Exception is thrown when trying to register HadoopFileSystem.

java.lang.IllegalStateException: Scheme: [file] has conflicting filesystems: [org.apache.beam.sdk.io.LocalFileSystem,
org.apache.beam.sdk.io.hdfs.HadoopFileSystem]
	at org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:498)

What is the correct way to handle `hdfs` url out of the box with TextIO & AvroIO ?

    String[] args = new String[]{
        "--hdfsConfiguration=[{\"dfs.client.use.datanode.hostname\": \"true\"}]"};
    HadoopFileSystemOptions options = PipelineOptionsFactory
        .fromArgs(args)
        .withValidation()
        .as(HadoopFileSystemOptions.class);
    Pipeline pipeline = Pipeline.create(options);
    configuration.add(config);
    options.setHdfsConfiguration(configuration);
    Pipeline pipeline = Pipeline.create(options); 





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message