flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1730) Add a FlinkTools.persist style method to the Data Set.
Date Tue, 01 Sep 2015 13:54:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725427#comment-14725427

ASF GitHub Bot commented on FLINK-1730:

GitHub user sachingoel0101 opened a pull request:


    [FLINK-1730]Persist operator on Data Sets

    This PR introduces a `persist` operation on `DataSet` which allows persisting the data
set in memory, allowing for direct access if this data set is operated on again and again.
    The idea behind the implementation is this:
    1. A `PersistOperator` extending a `SingleInputUdfOperator` for common api and Java API.
    2. A `Persist` driver strategy which allows the Job Graph to generate a `PersistNode`,
which just uses a `NoOpDriver` to forward results from input to output.
    3. `RegularPactTask` determines whether it is a Persist task and accordingly uses a `SpillingResettableMutableObjectIterator`
to read the input and persist them.
    4. To make the results truly persistent, the `MemorySegment`s must not be freed when the
`Task` ends. To this end, I have created a `DummyPersistInvokable` which does nothing. It
just prevents freeing of memory.
    5. All persisted memory segments are cleared out when the `MemoryManager` is shutting
down. There is a possibility of writing some kind of Cache clearing strategy here.
    For testing the functionality, I have written a test `PersistITCase` which generates 100
random Long values inside a Map function and persisted the output. Then, triggering the execution
twice must provide the same results.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sachingoel0101/flink flink-1730

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1083
commit a22cc670697cc601facb164f6fd84ef6438c2499
Author: Sachin Goel <sachingoel0101@gmail.com>
Date:   2015-08-24T16:07:04Z

    Implemented a persist operator which caches elements into a Spilling


> Add a FlinkTools.persist style method to the Data Set.
> ------------------------------------------------------
>                 Key: FLINK-1730
>                 URL: https://issues.apache.org/jira/browse/FLINK-1730
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Stephan Ewen
>            Priority: Minor
> I think this is an operation that will be needed more prominently. Defining a point where
one long logical program is broken into different executions.

This message was sent by Atlassian JIRA

View raw message