flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1730) Add a FlinkTools.persist style method to the Data Set.
Date Tue, 25 Aug 2015 11:20:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711090#comment-14711090

Stephan Ewen commented on FLINK-1730:

There is a lot of support in Flink's runtime to actually store an intermediate data set persistent,
starting in memory, going to disk if needed. It has not been connected to the APIs yet, because
the committers that worked prioritized some other aspects (high availability) in the meantime.

This feature should really work on the network stack level (caching buffers is most efficient)
and the execution graph level (can recompute lost results upon failures, properly represent
dependencies in the DAG).

Concerning other caching levels: I am not sure any other level is really needed. Any disk
level caching should start in memory as long as memory is available. Any memory-level caching
that silently drops the cache and requires lineage-based recomputation is pretty unpredictable,
so memory-destaging-to-disk is the most sensible version.

> Add a FlinkTools.persist style method to the Data Set.
> ------------------------------------------------------
>                 Key: FLINK-1730
>                 URL: https://issues.apache.org/jira/browse/FLINK-1730
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Stephan Ewen
>            Priority: Minor
> I think this is an operation that will be needed more prominently. Defining a point where
one long logical program is broken into different executions.

This message was sent by Atlassian JIRA

View raw message