flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Window based on tuple timestamps
Date Sat, 19 Sep 2015 17:28:14 GMT
Hi,
yes, setting the parallelism to 1 might solve it but it will also cripple
the performance. During next week the first part of our rework of the
windowing system should make it into master, so then you would be able to
test your programs with that. If you are using the 0.10-SNAPSHOT version.

I'm sorry we don't have a better answer right now but we are working hard
on fixing these things.

Cheers,
Aljoscha

On Sat, 19 Sep 2015 at 12:53 Philipp Goetze <philipp.goetze@tu-ilmenau.de>
wrote:

> Hi Aljoscha,
>
> on the big data sets I am not quite sure if they are completely ordered,
> but my test data (just 10 records) is sorted. There are only two different
> timestamps. Sometimes I end up with two windows (which is correct) and
> sometimes only one. So this is not deterministic. Should I maybe set the
> parallelism to 1 at some place?
>
> Best Regards,
> Philipp
>
>
> On 18.09.2015 17:05, Aljoscha Krettek wrote:
>
> Hi Philipp,
> am I correct to assume that your tuples do not arrive in the order of the
> timestamp that you extract. Unfortunately, for that case the current
> windowing implementation does not work correctly. We are working hard on
> fixing this for the upcoming 0.10 release, though. If you are interested in
> what we are doing there you can check out the design documents in the Flink
> wiki:
>
> https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Streams
> https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams
>
> Cheers,
> Aljoscha
>
> On Fri, 18 Sep 2015 at 13:19 Philipp Goetze <philipp.goetze@tu-ilmenau.de>
> wrote:
>
>> Hello again,
>>
>> another question from me =). Could you provide an example on how to
>> correctly use windows based on timestamps on the tuples (i.e. non-realtime)?
>>
>> As a simple example I tried something like this:
>>
>> val w4 = rdfSource.window(Time.of(200000L, customTimestamp,
>> 1219625100000L)).mapWindow(customMap _).flatten().print()
>>
>> In the customMap I just collect all tuples in one list for each window.
>> However, this does not work correctly as sometimes all the records are
>> collected in one list, although there should be two windows for the example
>> data.
>>
>> Best Regards,
>> Philipp
>>
>
>

Mime
View raw message