flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Klein" <klein.marku...@gmx.de>
Subject Aw: Re: Combine two independant streams
Date Wed, 01 Mar 2017 07:35:51 GMT
<html><head></head><body><div style="font-family: Verdana;font-size:
12.0px;"><div>
<div>Hi Fabian,</div>

<div>&nbsp;</div>

<div>thanks for your very goog explanation. However, I don&#39;t exactly know how
to increase the watermark by myself. Do you have an example for me? Do I have to override
the getCurrentWatermark method?</div>

<div>&nbsp;</div>

<div>Thanks,</div>

<div>Markus</div>

<div>&nbsp;
<div name="quote" style="margin:10px 5px 5px 10px; padding: 10px 0 10px 10px; border-left:2px
solid #C3D9E5; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div style="margin:0 0 10px 0;"><b>Gesendet:</b>&nbsp;Dienstag, 28.
Februar 2017 um 20:36 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Fabian Hueske&quot; &lt;fhueske@gmail.com&gt;<br/>
<b>An:</b>&nbsp;user@flink.apache.org<br/>
<b>Betreff:</b>&nbsp;Re: Combine two independant streams</div>

<div name="quoted-content">
<div>
<div>
<div>In event-time mode, operators compute their internal time from watermarks.</div>
Depending on how watermarks are generated, their time only increases if records with later
timestamps are processed. If no records arrive, no new watermarks are generated and the event-time
does not increase.<br/>
<br/>
Since you want to reprocess offline data, you cannot use processing time which uses the wall
clock time of the processing machines.</div>

<div>Instead you could use a custom periodic watermark that slowly increases the time
even if no new data arrives. However, you should be careful, because this could also lead
to late arriving events being dropped. The allowedLateness parameter can help to mitigate
the problem.<br/>
&nbsp;</div>

<div>Hope that helps,</div>

<div>Fabian</div>
</div>

<div class="gmail_extra">&nbsp;
<div class="gmail_quote">2017-02-28 18:50 GMT+01:00 Markus <span>&lt;<a
href="mailto:klein.markus90@gmx.de" onclick="parent.window.location.href=&#39;klein.markus90@gmx.de&#39;;
return false;" target="_blank">klein.markus90@gmx.de</a>&gt;</span>:

<blockquote class="gmail_quote" style="margin: 0 0 0 0.8ex;border-left: 1.0px rgb(204,204,204)
solid;padding-left: 1.0ex;">
<div>
<div class="m_-714483787166973542moz-cite-prefix">Hi Fabian,<br/>
<br/>
yeah, that&#39;s basically it. The events window gets closed only when a newer event arrives
(after 10 seconds window).<br/>
Can I tell Flink to close the event window at timeWindow.getEnd() even if no newer event arrives?<br/>
<br/>
Thanks,<br/>
Markus<br/>
<br/>
Am 28.02.17 um 17:19 schrieb Fabian Hueske:</div>

<div>
<div class="h5">
<blockquote>
<div>
<div>
<div>Hi Markus,<br/>
&nbsp;</div>
I&#39;m not sure I understood the issue with the second approach.<br/>
Is it that the stream of application events might be empty for some time such that its event
time is not increasing?</div>

<div><br/>
Best, Fabian</div>
</div>

<div class="gmail_extra">&nbsp;
<div class="gmail_quote">2017-02-28 17:02 GMT+01:00 Markus Klein <span>&lt;<a
href="mailto:klein.markus90@gmx.de" onclick="parent.window.location.href=&#39;klein.markus90@gmx.de&#39;;
return false;" target="_blank">klein.markus90@gmx.de</a>&gt;</span>:

<blockquote class="gmail_quote" style="margin: 0 0 0 0.8ex;border-left: 1.0px rgb(204,204,204)
solid;padding-left: 1.0ex;">
<div class="m_-714483787166973542HOEnZb">
<div class="m_-714483787166973542h5">Hello Flink Community,<br/>
&nbsp;<br/>
I have a question regarding combining two independant streams.&nbsp;<br/>
The first stream is a stream of events with metrics information. It occurs every 10 seconds.
What I want is to join a second stream with events from an application. The result should
be an event with the metrics and the events that happened the last 10 seconds.<br/>
&nbsp;<br/>
So my first approacch was to generate an ID which will be increased after every metric event.
This ID will be added to the application events and of cours for the current metric event.
This works somehow good for live events but for recalculating past events the two streams
have to start at the same point in event time.<br/>
&nbsp;<br/>
The second approach was to generate a time window of 10 seconds for the application events
and for the metrics and set the window end time as a key because flink windows end at e.g.
11:01:10, then 11:01:20 and so on. But this approach works only for past events because flink
needs another application event to know that 10 seconds have passed for the application events
window.<br/>
&nbsp;<br/>
I hope you guys understand the problem. Is there a way two combine them in a nice way? I don&#39;t
want to generate empty &quot;heartbeats&quot; for the application event stream.<br/>
&nbsp;<br/>
Thanks for your help.<br/>
&nbsp;<br/>
Greetings<br/>
Markus</div>
</div>
</blockquote>
</div>
</div>
</blockquote>

<p>&nbsp;</p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div></div></body></html>

Mime
View raw message