beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chamikara Jayalath (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
Date Mon, 20 Jun 2016 18:38:58 GMT
Chamikara Jayalath created BEAM-360:
---------------------------------------

             Summary: Add a framework for creating Python-SDK sources for new file types
                 Key: BEAM-360
                 URL: https://issues.apache.org/jira/browse/BEAM-360
             Project: Beam
          Issue Type: New Feature
          Components: sdk-py
            Reporter: Chamikara Jayalath
            Assignee: Chamikara Jayalath


We already have a framework for creating new sources for Beam Python SDK - https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326

It would be great if we can add a framework on top of this that encapsulates logic common
to sources that are based on files. This framework can include following features that are
common to sources based on files.
(1) glob expansion
(2) support for new file-systems
(3) dynamic work rebalancing based on byte offsets
(4) support for reading compressed files.

Java SDK has a similar framework and it's available at - https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message