flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Mehra <vme...@lyft.com>
Subject OVER operator filtering out records
Date Fri, 23 Aug 2019 22:09:04 GMT
We have a SQL based flink job which is consume a very low volume stream (1
or 2 events in few hours):






*SELECT user_id,    COUNT(*) OVER (PARTITION BY user_id ORDER BY rowtime
RANGE INTERVAL '30' DAY PRECEDING) as count_30_days,
COALESCE(occurred_at, logged_at) AS latency_marker,    rowtimeFROM
event_fooWHERE user_id IS NOT NULL*

The OVER operator seems to filter out events as per the flink dashboard
(records received = <non-zero-number> records sent = 0)

The operator looks like this:

*over: (PARTITION BY: $1, ORDER BY: rowtime, RANGEBETWEEN 2592000000
PRECEDING AND CURRENT ROW, select: (rowtime, $1, $2, COUNT(*) AS w0$o0)) ->
select: ($1 AS user_id, w0$o0 AS count_30_days, $2 AS latency_marker,
rowtime) -> to: Tuple2 -> Filter -> group_counter_count_30d.1.sqlRecords ->
sample_without_formatter*

I know that the OVER operator can discard late arriving events, but these
events are not arriving late for sure. The watermark for all operators stay
at 0 because the output events is 0.

We have an exactly same SQL job against a high volume stream that is
working fine. Watermarks progress in timely manner and events are delivered
in timely manner as well.

Any idea what could be going wrong? Are the events getting buffered waiting
for certain number of events? If so, what is the threshold?

Thanks,
Vinod

Mime
View raw message