flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cam Mach <cammac...@gmail.com>
Subject Re: Understanding watermark
Date Wed, 15 Jan 2020 07:53:44 GMT
Hi Guowei,

Thanks for your response.

What I understand from you, one operator has two watermarks? If so, one
operator's output watermark would be an input watermark of the next
operator? Does it sounds redundant?

Or do you mean the Web UI only show the input watermarks of every operator,
but since the source doesn't have input watermark show it show "No
Watermark" ? And we should have output watermark for source?

And, yes we want to understand when we should expect to see watermarks for
our "combined" sources (bounded and un-bounded) for our pipeline?

If you can be more directly, it would be very helpful.

Thanks,

On Tue, Jan 14, 2020 at 5:42 PM Guowei Ma <guowei.mgw@gmail.com> wrote:

> Hi, Cam,
> I think you might want to know why the web page does not show the
> watermark of the source.
> Currently, the web only shows the "input" watermark. The source only
> outputs the watermark so the web shows you that there is "No Watermark".
>  Actually Flink has "output" watermark metrics. I think Flink should also
> show this information on the web. Would you mind open a Jira to track this?
>
>
> Best,
> Guowei
>
>
> Cam Mach <cammach84@gmail.com> 于2020年1月15日周三 上午4:05写道:
>
>> Hi Till,
>>
>> Thanks for your response.
>>
>> Our sources are S3 and Kinesis. We have run several tests, and we are
>> able to take savepoint/checkpoint, but only when S3 complete reading. And
>> at that point, our pipeline has watermarks for other operators, but not the
>> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
>> have watermark for the source as well, right?
>>
>>  Attached is snapshot of our pipeline.
>>
>> [image: image.png]
>>
>> Thanks
>>
>>
>>
>> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <trohrmann@apache.org>
>> wrote:
>>
>>> Hi Cam,
>>>
>>> could you share a bit more details about your job (e.g. which sources
>>> are you using, what are your settings, etc.). Ideally you can provide a
>>> minimal example in order to better understand the program.
>>>
>>> From a high level perspective, there might be different problems: First
>>> of all, Flink does not support checkpointing/taking a savepoint if some of
>>> the job's operator have already terminated iirc. But your description
>>> points rather into the direction that your bounded source does not
>>> terminate. So maybe you are reading a file via
>>> StreamExecutionEnvironment.createFileInput
>>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>>> tell without a better understanding of your job.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <cammach84@gmail.com> wrote:
>>>
>>>> Hello Flink expert,
>>>>
>>>> We have a pipeline that read both bounded and unbounded sources and our
>>>> understanding is that when the bounded sources complete they should get a
>>>> watermark of +inf and then we should be able to take a savepoint and safely
>>>> restart the pipeline. However, we have source that never get watermarks and
>>>> we are confused as to what we are seeing and what we should expect
>>>>
>>>>
>>>> Cam Mach
>>>> Software Engineer
>>>> E-mail: cammach84@gmail.com
>>>> Tel: 206 972 2768
>>>>
>>>>

Mime
View raw message