beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-437) Data-dependent BigQueryIO in batch
Date Mon, 03 Apr 2017 21:06:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954162#comment-15954162
] 

ASF GitHub Bot commented on BEAM-437:
-------------------------------------

GitHub user reuvenlax opened a pull request:

    https://github.com/apache/beam/pull/2415

    [BEAM-437] Support data-dependent writes using BigQuery batch load jobs

    This pull request adds support for data-dependent writes when using batch load jobs. This
is accomplished via refactoring BigQueryIO into separate transforms, with the first being
a common PrepareWrite transform that determines which tables records should go to, followed
by transforms that know how to interpret this.
    
    One side benefit of this refactoring is that the different components can be used on their
own. For example, one request has been to allow dynamic creation of datasets in BigQueryIO.
A user can now accomplish this by running PrepareWrite themselves, followed by their own custom
transform to create datasets, and then the remaining transform.
    
    In order to test this, BigQueryIOTest was modified to use a proper fake service, removing
the dependency on mockito.
    
    R: @jkff 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/reuvenlax/incubator-beam dynamic_writes_in_batch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2415
    
----
commit 6088bf19dc03bb5ca0ccb760c52793ae27dfc06b
Author: Reuven Lax <relax@google.com>
Date:   2017-03-28T18:21:59Z

    Use tableRefFunction throughout BigQueryIO. Constant table writes use ConstantTableSpecFunction.

commit 73fa547e4ca2b44c4f11d7c7ed4d7ac77a701ad5
Author: Reuven Lax <relax@google.com>
Date:   2017-03-28T19:53:27Z

    Add PrepareWrite transform.

commit 60040c4991ee2fe5572d3dd7e2dfd381e21cead8
Author: Reuven Lax <relax@google.com>
Date:   2017-03-29T02:34:56Z

    Refactor streaming write branch into separate reusable components.

commit 359685ab997c934837c601610fec471b3da1dcbd
Author: Reuven Lax <relax@google.com>
Date:   2017-03-29T14:34:10Z

    Refactor batch load job path, and add support for data-dependent tables.

commit c9a1f2916af5cd2837d4d73887005e3b2ceff401
Author: Reuven Lax <relax@google.com>
Date:   2017-03-31T18:19:25Z

    Refactor batch loads, and add support for windowed writes.

commit 477b14f4952881d965f22b7591da1032dcfd0495
Author: Reuven Lax <relax@google.com>
Date:   2017-03-31T21:16:48Z

    Update tests

commit a6fb0292879b7ff9a68de2884417a4efd21f6479
Author: Reuven Lax <relax@google.com>
Date:   2017-04-01T01:53:04Z

    testing changes

commit 5a2a2dc55bb7339a5c17280ed6ad66cb13eef54d
Author: Reuven Lax <relax@google.com>
Date:   2017-04-02T18:32:37Z

    Fix more tests

commit cc146874470b51b0295a02cdcb81effda03372af
Author: Reuven Lax <relax@google.com>
Date:   2017-04-02T18:37:06Z

    Fix CheckStyle issues

commit 89f2dc88431e71f8d11cd9942c2ef653bfc1a2c1
Author: Reuven Lax <relax@google.com>
Date:   2017-04-03T02:47:03Z

    Final tests all work now

commit 6662121da44f16d79718c68dccf6eb6a86329268
Author: Reuven Lax <relax@google.com>
Date:   2017-04-03T02:57:50Z

    Some cleanups and comments

commit 257ccc06f10cd048b8190e124b241f3bd98c647b
Author: Reuven Lax <relax@google.com>
Date:   2017-04-03T03:27:16Z

    Remove ReturnT

commit 1ad3720c0273a808cafc2dd4d6e096b4f492c42b
Author: Reuven Lax <relax@google.com>
Date:   2017-04-03T04:39:50Z

    Separate streaming writes into two pluggable components - CreateTables, and StreamingWriteTables.

commit a111b148a2bf8bbb5f1119c0bff922c0801d0582
Author: Reuven Lax <relax@google.com>
Date:   2017-04-03T04:43:16Z

    Checkstyle fixes

----


> Data-dependent BigQueryIO in batch
> ----------------------------------
>
>                 Key: BEAM-437
>                 URL: https://issues.apache.org/jira/browse/BEAM-437
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>            Reporter: Daniel Halperin
>            Assignee: Reuven Lax
>            Priority: Minor
>
> Blocked by [BEAM-92].
> Right now, we use BigQuery's streaming write API when using window-dependent tables in
BigQuery. We should
> 1. Support data-dependent tables as well.
> 2. Find a way to use the batch write API.
> 3. This requires careful design to be idempotent or, at least, as close to idempotent
as possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message