flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aljoscha Krettek (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3133) Introduce collect()/count()/print() methods in DataStream API
Date Thu, 24 Aug 2017 15:26:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140168#comment-16140168
] 

Aljoscha Krettek commented on FLINK-3133:
-----------------------------------------

Quick note: I'm not against having async execution, in fact I would really like to have this.

I don't think any method that streams back data to the program is feasible. Mostly because
of failures. In a lot of tests we introduce artificial failures and ensure that the program
still computes the expected result. How would that work when collecting the results of a sink
back to the program. In my opinion this would only work if the assertion is verified somewhere
in the job, because that would also get restarted in case of failure.

> Introduce collect()/count()/print() methods in DataStream API
> -------------------------------------------------------------
>
>                 Key: FLINK-3133
>                 URL: https://issues.apache.org/jira/browse/FLINK-3133
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>    Affects Versions: 0.10.0, 1.0.0, 0.10.1
>            Reporter: Maximilian Michels
>            Assignee: Evgeny Kincharov
>             Fix For: 1.0.0
>
>
> The DataSet API's methods {{collect()}}, {{count()}}, and {{print()}} should be mirrored
to the DataStream API. 
> The semantics of the calls are different. We need to be able to sample parts of a stream,
e.g. by supplying a time period in the arguments to the methods. Users should use the {{JobClient}}
to retrieve the results.
> {code:java}
> StreamExecutionEnvironment env = StramEnvironment.getStreamExecutionEnvironment();
> DataStream<DataType> streamData = env.addSource(..).map(..);
> JobClient jobClient = env.executeWithControl();
> Iterable<DataType> sampled = jobClient.sampleStream(streamData, Time.seconds(5));
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message