flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: GroupBy result delay
Date Tue, 23 Jul 2019 09:48:44 GMT
Hi Fanbin,

The delay is most likely caused by the watermark delay.
A window is computed when the watermark passes the end of the window. If
you configured the watermark to be 10 minutes before the current max
timestamp (probably to account for out of order data), then the window will
be computed with approx. 10 minute delay.

Best, Fabian

Am Di., 23. Juli 2019 um 02:00 Uhr schrieb Fanbin Bu <fanbin.bu@coinbase.com
>:

> Hi,
> I have a Flink sql streaming job defined by:
>
> SELECT
>   user_id
>   , hop_end(created_at, interval '30' second, interval '1' minute) as bucket_ts
>   , count(name) as count
> FROM event
> WHERE name = 'signin'
> GROUP BY
>   user_id
>   , hop(created_at, interval '30' second, interval '1' minute)
>
>
> there is a noticeably delay of the groupBy operator. For example, I only
> see the record sent out 10 min later after the record received in. see the
> attached pic.
>
> [image: image.png]
>
> I m expecting to see the group by result after one minute since the
> sliding window size is 1 min and the slide is 30 sec.
>
> There is no such issue if I run the job locally in IntelliJ. However, I
> ran into the above issue if I run the job on EMR (flink version = 1.7)
>
> Can anybody give a clue of what could be wrong?
> Thanks,
>
> Fanbin
>

Mime
View raw message