beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aviem Zur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem
Date Thu, 20 Apr 2017 08:36:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976308#comment-15976308
] 

Aviem Zur commented on BEAM-2005:
---------------------------------

Yes, it makes sense that the code would be in an extension and a BoM/archetypes + good documentation
will help users get up and running.

However I still think the case I mentioned will happen in practice:
bq. a user creates a project from scratch, adds a dependency on a runner (say direct runner),
uses TextIO to do a word count and it works for them when passing "file://path/to/file", changing
this to "hdfs://path/to/file" will not work.

So in this case the user will have to resort to looking up documentation on how to achieve
what they wanted.

What we could do, if we don't want to have {{core}} bloated with dependencies on all filesystems
out of the box is at least have a {{scheme}} -> {{module}} mapping which can be used to
display an informative error message such as:
bq. To enable HDFS support add a dependency on sdk-java-extensions-hadoop"
And a similar message for the other filesystem schemes which we have support for in our extension
modules.
This could be achieved by a static {{Map<String, String>}} in {{core}}.

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> -----------------------------------------------------------
>
>                 Key: BEAM-2005
>                 URL: https://issues.apache.org/jira/browse/BEAM-2005
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many different places.

> We should add a Hadoop FileSystem implementation (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
- that would enable us to read from any file system that implements FileSystem (including
HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message