beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-849) Redesign PipelineResult API
Date Wed, 01 Mar 2017 19:52:45 GMT

    [ https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890915#comment-15890915
] 

Eugene Kirpichov commented on BEAM-849:
---------------------------------------

Sorry, I don't understand the question. This use case - representing reading from a file,
including a continually growing file, as a file-reading splittable ParDo applied to a PCollection
of filenames - is one of the main motivating use cases of splittable DoFn https://s.apache.org/splittable-do-fn
; all I'm saying here is that, even if the file eventually stops growing forever and we would
like the pipeline to terminate in that case, this makes sense regardless of whether the runner
is a "batch" or a "streaming" runner. And a user may want to choose to run this pipeline using
their favorite runner in "streaming" mode for various reasons; one being that a "streaming"
runner might process the data from the file immediately as it appears, whereas a "batch" runner
might think it's ok to wait for the whole file to appear and then process it. This is of course
runner-specific; my main point is that there is no such thing as "unbounded pipelines" in
the Beam model, and semantics of waitForFinish() should be defined in terms of the Beam model
and treated equally by all runners regardless of whether the runner has a distinction between
"batch/streaming" modes.

> Redesign PipelineResult API
> ---------------------------
>
>                 Key: BEAM-849
>                 URL: https://issues.apache.org/jira/browse/BEAM-849
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses waitUntilFinish() and cancel().

> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message