beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Sisk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem
Date Thu, 20 Apr 2017 20:56:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977497#comment-15977497
] 

Stephen Sisk commented on BEAM-2005:
------------------------------------

Some additional questions that I think are related to the registration question we're talking
about here - 

As discussed above, Hadoop FileSystem can be used to access multiple types of filesystems
(s3/hdfs/etc...)

1) However, FileSystemRegistrar only allows 1 schema to be registered per FileSystemRegistrar.
That means the single class can only handle one schema.  We could either change the interface
to allow registering multiple schema, or create multiple classes that all inherit from a base
class and declare a separate schema. (eg s3HadoopFileSystem, HdfsHadoopFileSystem, etc...)

2) Additionally, Hadoop filesystems are configured via Configuration objects (eg, the options
discussed here: https://issues.apache.org/jira/browse/HADOOP-10400 for S3) - that means that
a user might/probably should be able to configure those options and have multiple connections
per schema type (ie,  "I want to connect to two different HDFS instances") Looking at how
the Beam FileSystem is currently implemented, it's not clear to me that it is possible today
to handle this scenario.

This 2nd question shouldn't block having a simple "I can read from one hdfs instance" case
working, but it does seem important in the long run.

cc [~davor] [~dhalperi@google.com]

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> -----------------------------------------------------------
>
>                 Key: BEAM-2005
>                 URL: https://issues.apache.org/jira/browse/BEAM-2005
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many different places.

> We should add a Hadoop FileSystem implementation (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
- that would enable us to read from any file system that implements FileSystem (including
HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message