flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mbalassi <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-2283] [streaming] Make grouped reduce/f...
Date Mon, 21 Sep 2015 11:20:15 GMT
GitHub user mbalassi opened a pull request:


    [FLINK-2283] [streaming] Make grouped reduce/fold/aggregations stateful

    There is an open discussion at the related ticket [1] about fully removing the operators
that I touch and partially remove here. I can accept both conclusions of the discussion, but
even in the scenario when the operators get removed from the API afterwards the PR has certain
merit to it:
    1. Cleans up the unused `StreamReduce` and `StreamFold` operators which should be removed
either way.
    2. Adds an integration test for ensuring that not only user defined functions, but internal
streaming operators can properly rely on the `OperatorState` interface. To do this it currently
relies on the grouped reduce/fold aggregations, but this is just as important for windowing
states, which are not properly checkpointed yet.
    3. Makes the grouped fold/reduce operators stateful, so that the previous test can be
    Some justification for the implementation choices:
    1. @gyfora has suggested to use the partitioned state at the ticket [1] instead of the
manual map creation. In this scenario the grouped operators would not be unit testable any
more as they would be dependent on the state partitioner information found in the keyed datastream.
I decided against it.
    2. @StephanEwen has recently advised against adding unnecessary integration tests. [2]
This is a feature that can only be tested as an integration test. I personally feel the need
to cover internal operators with a checkpointing test despite the fact they currently use
exactly the same mechanism as the UDFs as this implementation might be subject to slight changes.
    3. Elaborating on the previous point the `OperatorState` currently storing the internal
state is also accessible for the user. This is an undesirable feature and might lead to accidental
overwrite of the state. I am opening a Jira ticket for this. 
    [1] https://issues.apache.org/jira/browse/FLINK-2283
    [2] https://mail-archives.apache.org/mod_mbox/flink-dev/201509.mbox/%3CCANC1h_vvekciNVDzqCb8N4E5Kfzu4e1Mosnse1%3DV11HXnD2PBQ%40mail.gmail.com%3E

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mbalassi/flink aggregator-states

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1155
commit 29ca808ccb8a1d705927eabb492e70df5e5af06c
Author: mbalassi <mbalassi@apache.org>
Date:   2015-09-11T14:32:09Z

    [streaming] Removed unused StreamReduce
    Refactored corresponding tests, some minor cleanups.

commit 4bd1dd035780402919bb5257274e9258457dadf3
Author: mbalassi <mbalassi@apache.org>
Date:   2015-09-13T06:19:07Z

    [FLINK-2283] [streaming] grouped reduce and fold operators checkpoint state

commit 3688a7c98500179f454e1641aedd7758b1fdc644
Author: mbalassi <mbalassi@apache.org>
Date:   2015-09-20T20:27:11Z

    [FLINK-2283] [streaming] Test for checkpointing in internal operators


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message