beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in TestDataflowRunner
Date Fri, 02 Sep 2016 21:26:20 GMT

     [ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Liu updated BEAM-604:
--------------------------
    Description: 
Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner
can't handle this case. Need to update TestDataflowRunner so that streaming integration test
such as WindowedWordCountIT can run with it.

Implementation:
Query watermark of each step and wait until all watermarks set to MAX then cancel the job.

Update:
Suggesting by [~peihe0@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish.
Thus, all dataflow streaming jobs with bounded input will take advantage of this change and
are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep
simple and free from handling batch and streaming two cases.

Update:
Pipeline author should have control on whether or not canceling streaming job and when. Test
framework is a better place to auto-cancel streaming test job when curtain conditions meet,
rather than in waitUntilFinish().


  was:
Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner
can't handle this case. Need to update TestDataflowRunner so that streaming integration test
such as WindowedWordCountIT can run with it.

Implementation:
Query watermark of each step and wait until all watermarks set to MAX then cancel the job.

Update:
Suggesting by [~peihe0@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish.
Thus, all dataflow streaming jobs with bounded input will take advantage of this change and
are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep
simple and free from handling batch and streaming two cases.

Update:idile 
1. pipeline author have control on whether or not canceling streaming job and when. The ideal
way to do is:
{code}
job = pipeline.run();
job.waitUntilFinish();
job.cancel();
{code}



> Use Watermark Check Streaming Job Finish in TestDataflowRunner
> --------------------------------------------------------------
>
>                 Key: BEAM-604
>                 URL: https://issues.apache.org/jira/browse/BEAM-604
>             Project: Beam
>          Issue Type: Improvement
>            Reporter: Mark Liu
>            Assignee: Mark Liu
>            Priority: Minor
>
> Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner
can't handle this case. Need to update TestDataflowRunner so that streaming integration test
such as WindowedWordCountIT can run with it.
> Implementation:
> Query watermark of each step and wait until all watermarks set to MAX then cancel the
job.
> Update:
> Suggesting by [~peihe0@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish.
Thus, all dataflow streaming jobs with bounded input will take advantage of this change and
are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep
simple and free from handling batch and streaming two cases.
> Update:
> Pipeline author should have control on whether or not canceling streaming job and when.
Test framework is a better place to auto-cancel streaming test job when curtain conditions
meet, rather than in waitUntilFinish().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message