beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Etienne Chauchot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-849) Redesign PipelineResult API
Date Wed, 10 May 2017 08:22:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004274#comment-16004274
] 

Etienne Chauchot commented on BEAM-849:
---------------------------------------

 Here are the differences I have observed in streaming pipelines termination:

- direct runner: when the output watermarks of all of its PCollections progress to +infinity

- apex runner: when the output watermarks of all of its PCollections progress to +infinity

- dataflow runner: when the output watermarks of all of its PCollections progress to +infinity

- spark runner: streaming pipelines do not terminate unless timeout is set in pipelineResult.waitUntilFinish()

- flink runner: streaming pipelines do not terminate unless timeout is set in pipelineResult.waitUntilFinish()
(thanks to Aljoscha for timeout support PR https://github.com/apache/beam/pull/2915#pullrequestreview-37090326)


=> Is the direct/apex/dataflow behavior the correct "beam model" behavior?


I know that, at least for spark (mails in this thread), there is no easy way to know that
we're done reading a source, so it might be very difficult (at least for this runner) to unify
toward +infinity behavior if it is chosen as the standard behavior.


> Redesign PipelineResult API
> ---------------------------
>
>                 Key: BEAM-849
>                 URL: https://issues.apache.org/jira/browse/BEAM-849
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses waitUntilFinish() and cancel().

> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message