flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oytun Tez <oy...@motaword.com>
Subject Re: question for handling db data
Date Fri, 26 Jul 2019 14:12:27 GMT
imagine an operator, ProcessFunction, it has 2 incoming data:
geofences via broadcast,
user location via normal data stream

geofence updates and user location updates will come separately into this
single operator.

when geofence update comes via broadcast, the operator will update its
state with the new geofence rules. this.geofenceListState =
this happens in processBroadcastElement() method.

whenever geofence data is updated, it will come to this operator, into
processBroadcastElement, and you will put the new geofence list into the
operator's state.

when user location update comes to the operator, via regular stream, you
will access this.geofenceListState and do your calculations and collect()
whatever you need to collect at the end of computation.

regular stream comes, this time, to processElement() method.


geofence update will not affect the previously collected elements from
processElement. but Flink will make sure all of instances of this operator
in various task managers will get the same geofence update via
processBroadcastElement(), no matter whether the operator is keyed.

your processElement, meaning your user location updates, will do its
calculation via the latest geofence data from processBroadcastElement. when
geofence data is updated, user location updates from that point on will use
the new geofence data.

i hope this is more clear...

Oytun Tez

*M O T A W O R D*
The World's Fastest Human Translation Platform.
oytun@motaword.com — www.motaword.com

On Thu, Jul 25, 2019 at 7:08 PM jaya sai <jayasai470@gmail.com> wrote:

> Hi Oytun,
> Thanks for the quick reply, will study more
> so when we have a stream let say in kakfa for edit, delete and insert of
> geofences and add it to the flink broadcast downstream, what happens if the
> processing is taking place and we update the bounds of geofence ?
> when will the new data or how the updates take place and any impact on the
> geofence events out come based on the location data ?
> Thank you,
> On Thu, Jul 25, 2019 at 5:53 PM Oytun Tez <oytun@motaword.com> wrote:
>> Hi Jaya,
>> Broadcast pattern may help here. Take a look at this:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html
>> You'll still keep your geofence data as a stream (depending on the data
>> and use case, maybe the whole list of geofence as a single stream item),
>> broadcast the stream to downstream operators, which will now have geofence
>> data in their state as their slow changing data (processBroadcastElement),
>> and the user location regularly coming to the operator (processElement).
>> ---
>> Oytun Tez
>> *M O T A W O R D*
>> The World's Fastest Human Translation Platform.
>> oytun@motaword.com — www.motaword.com
>> On Thu, Jul 25, 2019 at 6:48 PM jaya sai <jayasai470@gmail.com> wrote:
>>> Hello,
>>> I have a question on using flink, we have a small data set which does
>>> not change often but have another data set which we need to compare with it
>>> and it has lots of data
>>> let say I have two collections geofence and locations in mongodb.
>>> Geofence collection does not change often and relatively small, but we have
>>> location data coming in at high amounts from clients and we need to
>>> calculate the goefence entry exits based on geofence and location data
>>> point.
>>> For calculating the entry and exit we were thinking of using flink CEP.
>>> But our problem is sometimes geofence data changes and we need to update
>>> the in memory store of the flink somehow
>>> we were thinking of bootstrapping the memory of flink processor by
>>> loading data on initial start and subscribe to kafaka topic to listen for
>>> geofence changes and re-pull the data
>>> Is this a valid approach ?
>>> Thank you,

View raw message