beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-3169) WriteFiles data loss with some triggers
Date Tue, 14 Nov 2017 00:08:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250524#comment-16250524
] 

ASF GitHub Bot commented on BEAM-3169:
--------------------------------------

GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/4124

    [BEAM-3169] Fixes a data loss bug in WriteFiles when used with fire-once triggers

    https://issues.apache.org/jira/browse/BEAM-3169
    
    This required a bit of twiddling with shard assignment logic too. The gist of the change
is changing the pre-finalize GBK to Reshuffle. I audited all other usages of GBK in the SDK
and it appears that only this one is buggy: others either explicitly set a repeated trigger
before applying the GBK, or are directly applied to the user's input and the user's trigger
firing behavior is WAI.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam write-files-data-loss

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/4124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4124
    
----
commit 17c4af7c5a3a9475a284afd4fe413cb57016e7e5
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-11-13T19:09:01Z

    Slightly simplifies fixed sharding

commit 3926abf84c681f3b6918a3d691ca2cb00a2e40e6
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-11-13T19:58:11Z

    More clear and consistent shard number assignment logic

commit e52d3d8d49d6ff2b2b767f8069e1f71f8c42c9ae
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-11-13T20:28:34Z

    Materializes file results via Reshuffle rather than GBK

commit 1a625ecf313b6ff03311464d40a5515736cbbdd7
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-11-13T23:55:44Z

    Adds test for WriteFiles with a fire-once trigger

commit cba9ca163c621c1b187965226748596e2f5f8600
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-11-14T00:04:03Z

    makes checkstyle happy

----


> WriteFiles data loss with some triggers
> ---------------------------------------
>
>                 Key: BEAM-3169
>                 URL: https://issues.apache.org/jira/browse/BEAM-3169
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>    Affects Versions: 2.0.0, 2.1.0, 2.2.0
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>            Priority: Critical
>             Fix For: 2.2.0
>
>
> https://stackoverflow.com/questions/47113773/dataflow-2-1-0-streaming-application-is-not-cleaning-temp-folders/47142671?noredirect=1#comment81401472_47142671
> Details in comments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message