flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
Date Mon, 29 Feb 2016 16:34:18 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172094#comment-15172094
] 

ASF GitHub Bot commented on FLINK-3257:
---------------------------------------

Github user senorcarbone commented on the pull request:

    https://github.com/apache/flink/pull/1668#issuecomment-190279441
  
    You can find an alternative version using `ListState` in the following branch:
    https://github.com/senorcarbone/flink/commits/ftloopsalt
    So I noticed that this version is quite **slower** than the one with custom operator state
but it can support larger states apparently.
    
    I am (ab)using the PartitionedState to store the ListState in the same key, as @gyfora
suggested since it is the only way to obtain the nice representations at the moment. It would
be nice to have them available for operator state snapshots as well - @aljoscha have you thought
about it? When there is free time (after the release) it would be nice to see what @aljoscha
and @StephanEwen think of the two takes as well. No hurries, just take a look when you have
time!
    
    The two annoying issues I noticed during testing and we need to check soon are the following:
    
    - The overhead of transmitting and finally delivering a barrier from the `head` to its
consumers increases in time (for each subsequent checkpoint). That is due to having a single
queue at the beginning of the iterative part of the job. Events coming from the backedge are
pushed further behind the input queue.  It would be nice to have take events in round robin
among the two input gates (iteration source, regular input). Otherwise, checkpoints in iterative
jobs can be really prolonged in time due to this.
    
    - We need a proper way to deal with deadlocks. I removed the part where we discard events
in the tail upon timeout since that boils down to at most once semantics. This PR is not solving
deadlocks but I think we should find a graceful way to tackle them. (@uce, any ideas? )


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> -------------------------------------------------------------------
>
>                 Key: FLINK-3257
>                 URL: https://issues.apache.org/jira/browse/FLINK-3257
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Paris Carbone
>            Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution graph. An alternative
scheme can potentially include records in-transit through the back-edges of a cyclic execution
graph (ABS [1]) to achieve the same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as follows along
the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start block output
and start upstream backup of all records forwarded from the respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch barrier
to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource should finalize
the snapshot, unblock its output and emit all records in-transit in FIFO order and continue
the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected snapshot first
and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but this can
be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message