beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Sisk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2031) Hadoop FileSystem needs to receive Hadoop Configuration
Date Wed, 26 Apr 2017 01:16:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983958#comment-15983958
] 

Stephen Sisk commented on BEAM-2031:
------------------------------------

yeah - in that doc, I think that "2. Construct FileSystemConfig (conceptually a serializable
map)" is the world I'm hoping to live in :)

Luke and I were talking, we think that there's a possible way to make multiple hadoopfilesystem
configurations work - if the below assumptions are true.

Assumptions:
* fs.default.name is always set on Hadoop Configurations used to connect to filesystems
* fs.default.name always represents a unique prefix for different servers/useful configurations
for user's purposes
* the user always uses prefixes that match to fs.default.name
(I'm not sure if those assumptions are true or not given my naivete in the hadoop ecosystem)

Given those, we could:
* Allow the user to provide a list of configurations (via pipelineoptions)
* Register for the unique set of schemes present in the configurations (might require some
small changes to allow this to work)
* Inside of HadoopFileSystem, maintain a map of fs.default.name -> configuration
* When hadoop file system is given a uri, it would just look up the configuration based on
the prefix, and then use that configuration.

This is aspirational for first stable release, but if anyone has insights into whether or
not those assumptions are true, that'd be useful.

This may be moot if we use option 2 (Construct FileSystemConfig) in davor's doc.

> Hadoop FileSystem needs to receive Hadoop Configuration
> -------------------------------------------------------
>
>                 Key: BEAM-2031
>                 URL: https://issues.apache.org/jira/browse/BEAM-2031
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
>
>
> Since Beam FileSystem objects are configured via PipelineOptions, we need to pass a Hadoop
Configuration through PipelineOptions. I think that's very solvable, but it does seem semi-complicated.
> cc [~peihe0@gmail.com] I believe you mentioned in the past that you had an answer to
this - is that written down anywhere?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message