beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-674) Add GridFS support to MongoDB IO
Date Mon, 26 Sep 2016 15:12:20 GMT

    [ https://issues.apache.org/jira/browse/BEAM-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523335#comment-15523335
] 

ASF GitHub Bot commented on BEAM-674:
-------------------------------------

GitHub user dkulp opened a pull request:

    https://github.com/apache/incubator-beam/pull/1003

    [BEAM-674] Source part of GridFS IO

    This is the "Source" part for GridFS based IO for beam.  (will work on Sink next, but
would like to get this reviewed and merged first) . The "default" is to parse each file as
text files (by line), but a parser function can be provided to take the InputStream and parse
via whatever is required.   
    
    For runners that can split into bundles it attempts to assign files in the grid to different
bundles.  


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dkulp/incubator-beam gridfs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/1003.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1003
    
----
commit d5cdc2429622f65a762774de8b5baf15334e55e2
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-16T20:58:56Z

    Add GridFS io

commit a9212662744c14f10cd811540c3e9268c32c25c4
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-16T21:19:50Z

    Fix checkstyle issues

commit cee0a06b6a465a276c2c5410d7d3f9af703982d4
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-19T17:22:44Z

    Attempt to get a converter in there

commit fafa8fa607f22eacf918abb13419f28df9d2a8e9
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-19T17:32:44Z

    Fix javac compile problem

commit 7e9872f12c74902f1a23e5a27eb0027ae753947a
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-19T17:50:11Z

    Force a serializable

commit 265747946864b226235ee5b758e6c10b7cc3992f
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-19T17:56:03Z

    Add the needed coder

commit 4f54495afe7ff4768d873350c345d39905d812fc
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-19T18:02:39Z

    Change to using the GridFSDBFile instead of InputStream so the parsingFn can have access
to tall the metadata

commit cbeebf02542a5e5a5f4b9a6c370b1b68b46d2deb
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-19T18:25:23Z

    Flip to allowing the parser to have complete control over how the item is added to the
collection

commit a08007b9f444fedcde78ab38c6cdf505b3864c61
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-19T18:26:19Z

    Fix unused imports

commit a4840e98d891d3fa783654a472af06c4d399a929
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-21T19:51:45Z

    Add test for the parser functionality and cleanup some of that code

commit 438a792a796be77186d79aa3fdb221efcced6d4f
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-21T20:01:33Z

    Move the coder out from the parser

commit e8fcdbf3cebd6fa4648f328484dee07fec35b21a
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-22T12:36:50Z

    Fix test

commit 1d1a373fc7cec4e78bf0e618a902c15005fc36b4
Author: Daniel Kulp <dkulp@apache.org>
Date:   2016-09-23T14:49:53Z

    Flip to using BoundedSource so it can be broken up into bundles

----


> Add GridFS support to MongoDB IO
> --------------------------------
>
>                 Key: BEAM-674
>                 URL: https://issues.apache.org/jira/browse/BEAM-674
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>            Reporter: Daniel Kulp
>            Assignee: Daniel Kulp
>
> MongoDB has an "extension" called GridFS that allows storing of very large "files" into
the MongoDB database in a relatively efficient way.   It would be good to add a GridFS API
based IO to allow retrieving the data for processing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message