beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-849) Redesign PipelineResult API
Date Wed, 01 Mar 2017 21:53:45 GMT

    [ https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891137#comment-15891137
] 

Eugene Kirpichov commented on BEAM-849:
---------------------------------------

On your first paragraph: yes, I'm not saying that waitUntilFinish should trigger the termination
- I'm just saying that such a pipeline must terminate regardless of whether it is a "streaming
runner" or not (and this is currently not the case, e.g., for the Dataflow runner); so if
somebody calls waitUntilFinish, this call should wait for that termination and complete as
well. I guess, though, at this point we're discussing "when should pipelines terminate" rather
than "what should the API be for detecting that".

In the example I gave, the "end" is not necessarily known ahead of execution. E.g. imagine
a use case where we continually tail the file and stream data from it until the file is marked
with a read-only attribute. It might get marked soon, tomorrow, or never at all - then we
should keep running and processing new records as they arrive; but when it's marked read-only,
the pipeline should terminate.

Unbounded collections are part of the SDK, but unbounded pipelines are not. I guess one could
introduce terminology that an unbounded pipeline is a pipeline that has at least one unbounded
collection?... but again, it seems like the only use case for that would be when a runner
that only supports bounded collections validates whether a pipeline being submitted satisfies
this.

Ideally I would like to stop using terminology such as "batch" and "streaming" altogether,
except when referring to a particular runner ("batch Dataflow runner", "streaming Spark runner").

> Redesign PipelineResult API
> ---------------------------
>
>                 Key: BEAM-849
>                 URL: https://issues.apache.org/jira/browse/BEAM-849
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses waitUntilFinish() and cancel().

> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message