beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-849) Redesign PipelineResult API
Date Wed, 01 Mar 2017 21:53:45 GMT


Eugene Kirpichov commented on BEAM-849:

On your first paragraph: yes, I'm not saying that waitUntilFinish should trigger the termination
- I'm just saying that such a pipeline must terminate regardless of whether it is a "streaming
runner" or not (and this is currently not the case, e.g., for the Dataflow runner); so if
somebody calls waitUntilFinish, this call should wait for that termination and complete as
well. I guess, though, at this point we're discussing "when should pipelines terminate" rather
than "what should the API be for detecting that".

In the example I gave, the "end" is not necessarily known ahead of execution. E.g. imagine
a use case where we continually tail the file and stream data from it until the file is marked
with a read-only attribute. It might get marked soon, tomorrow, or never at all - then we
should keep running and processing new records as they arrive; but when it's marked read-only,
the pipeline should terminate.

Unbounded collections are part of the SDK, but unbounded pipelines are not. I guess one could
introduce terminology that an unbounded pipeline is a pipeline that has at least one unbounded
collection?... but again, it seems like the only use case for that would be when a runner
that only supports bounded collections validates whether a pipeline being submitted satisfies

Ideally I would like to stop using terminology such as "batch" and "streaming" altogether,
except when referring to a particular runner ("batch Dataflow runner", "streaming Spark runner").

> Redesign PipelineResult API
> ---------------------------
>                 Key: BEAM-849
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Pei He
> Current state: 
> Jira addresses waitUntilFinish() and cancel().

> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs

This message was sent by Atlassian JIRA

View raw message