flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron Crocker <rcroc...@newrelic.com>
Subject Finding things not seen in the last window
Date Sat, 30 Sep 2017 01:52:13 GMT
Hi -

I have a colleague who is trying to write a flink job that will determine deltas from period
to period. Let’s say the periods are 1 minutes. What he would like to do is report in minute
2 those things that are new since from minute 1, then in minute 3 report those things that
are new also since minute 1.

For example, consider the stream looks like
minute | name
=======|=======
     1 | abc
     1 | def
     2 | abc
     2 | ghi
     3 | abc
     3 | def
     4 | ghi
     4 | jkl

What we would like to report is:
minute | count | names
=======|=======|=======
     1 |     2 | abc, def
     2 |     1 | ghi
     3 |     0 |
     4 |     1 | jkl

In minute 2, abc was already seen but ghi is new, so it gets reported out as new. In minute
3, abc and def havalready been seen, so there are no new names, and again in minute 4 ghi
has been seen but jkl is new, so we report out the 1 new name.

I’m struggling to help and thought someone here might be able to help. I have thought about
merging two streams (the stream of new things and the stream of the full set seen so far)
but haven’t tried that yet. 

I welcome any of your inputs.

Thanks!

Ron
—
Ron Crocker
Principal Engineer & Architect
( ( •)) New Relic
rcrocker@newrelic.com
M: +1 630 363 8835


Mime
View raw message