beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-849) Redesign PipelineResult API
Date Wed, 01 Mar 2017 19:21:45 GMT


Eugene Kirpichov commented on BEAM-849:

I disagree that "unbounded pipelines" can't finish successfully.

- Dataflow runner supports draining of pipelines, which leads to successful termination.
- It is possible to run a pipeline like Create.of(1, 2, 3) + ParDo(do nothing) using a streaming
runner, and it should terminate rather than hang.
- One might ask "why run such a pipeline with a streaming runner", but it makes a lot more
sense if the ParDo is splittable. E.g. Create.of(filename) + ParDo(tail file) + ParDo(process
records) could use the low-latency capabilities of a streaming runner, but successfully terminate
when the file is somehow "finalized". As a more mundane example - tests in
should pass in streaming runners as well as batch runners.
- "Unbounded pipeline" is in general not a Beam concept - we should have a batch/streaming-agnostic
meaning of "finished" in "waitUntilFinished". I propose the one that Dataflow runner uses
for deciding when drain is completed: "all watermarks have progressed to infinity".

> Redesign PipelineResult API
> ---------------------------
>                 Key: BEAM-849
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Pei He
> Current state: 
> Jira addresses waitUntilFinish() and cancel().

> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs

This message was sent by Atlassian JIRA

View raw message