flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: Keep Model in Operator instance up to date
Date Wed, 19 Aug 2015 08:07:39 GMT
Hey,

If it is always better to check the events against a more up-to-date model
(even if the events we are checking arrived before the update) then it is
fine to keep the model outside of the system.

In this case we need to make sure that we can push the updates to the
external system consistently. If you are using the PersistentKafkaSource
for instance it can happen that some messages are replayed in case of
failure. In this case you need to make sure that you remove duplicate
updates or have idempotent updates.

You can read about the checkpoint mechanism in the Flink website:
https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html

Cheers,
Gyula
On Wed, Aug 19, 2015 at 9:56 AM Welly Tambunan <if05041@gmail.com> wrote:

> Thanks Gyula,
>
> Another question i have..
>
> > ... while external model updates would be *tricky *to keep consistent.
> Is that still the case if the Operator treat the external model as
> read-only ? We create another stream that will update the external model
> separately.
>
> Could you please elaborate more about this one ?
>
> Cheers
>
> On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra <gyula.fora@gmail.com> wrote:
>
>> In that case I would apply a map to wrap in some common type, like a n
>> Either<t1,t2> before the union.
>>
>> And then in the coflatmap you can unwrap it.
>> On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan <if05041@gmail.com> wrote:
>>
>>> Hi Gyula,
>>>
>>> Thanks.
>>>
>>> However update1 and update2 have a different type. Based on my
>>> understanding, i don't think we can use union. How can we handle this one ?
>>>
>>> We like to create our event strongly type to get the domain language
>>> captured.
>>>
>>>
>>> Cheers
>>>
>>> On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra <gyula.fora@gmail.com>
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> One input of your co-flatmap would be model updates and the other input
>>>> would be events to check against the model if I understand correctly.
>>>>
>>>> This means that if your model updates come from more than one stream
>>>> you need to union them into a single stream before connecting them with the
>>>> event stream and applying the coatmap.
>>>>
>>>> DataStream updates1 = ....
>>>> DataStream updates2 = ....
>>>> DataStream events = ....
>>>>
>>>> events.connect(updates1.union(updates2).broadcast()).flatMap(...)
>>>>
>>>> Does this answer your question?
>>>>
>>>> Gyula
>>>>
>>>>
>>>> On Wednesday, August 19, 2015, Welly Tambunan <if05041@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gyula,
>>>>>
>>>>> Thanks for your response.
>>>>>
>>>>> However the model can received multiple event for update. How can we
>>>>> do that with co-flatmap as i can see the connect API only received single
>>>>> datastream ?
>>>>>
>>>>>
>>>>> > ... while external model updates would be tricky to keep consistent.
>>>>>
>>>>> Is that still the case if the Operator treat the external model as
>>>>> read-only ? We create another stream that will update the external model
>>>>> separately.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra <gyfora@apache.org>
wrote:
>>>>>
>>>>>> Hey!
>>>>>>
>>>>>> I think it is safe to say that the best approach in this case is
>>>>>> creating a co-flatmap that will receive updates on one input. The
events
>>>>>> should probably be broadcasted in this case so you can check in parallel.
>>>>>>
>>>>>> This approach can be used effectively with Flink's checkpoint
>>>>>> mechanism, while external model updates would be tricky to keep consistent.
>>>>>>
>>>>>> Cheers,
>>>>>> Gyula
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan <if05041@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> We have a streaming computation that required to validate the
data
>>>>>>> stream against the model provided by the user.
>>>>>>>
>>>>>>> Right now what I have done is to load the model into flink operator
>>>>>>> and then validate against it. However the model can be updated
and changed
>>>>>>> frequently. Fortunately we always publish this event to RabbitMQ.
>>>>>>>
>>>>>>> I think we can
>>>>>>>
>>>>>>>
>>>>>>>    1. Create RabbitMq listener for model changed event from inside
>>>>>>>    the operator, then update the model if event arrived.
>>>>>>>
>>>>>>>    But i think this will create race condition if not handle
>>>>>>>    correctly and it seems odd to keep this
>>>>>>>
>>>>>>>    2. We can move the model into external in external memory
cache
>>>>>>>    storage and keep the model up to date using flink. So the
operator will
>>>>>>>    retrieve that from memory cache
>>>>>>>
>>>>>>>    3. Create two stream and using co operator for managing the
>>>>>>>    shared state.
>>>>>>>
>>>>>>>
>>>>>>> What is your suggestion on keeping the state up to date from
>>>>>>> external event ? Is there some kind of best practice for maintaining
model
>>>>>>> up to date on streaming operator ?
>>>>>>>
>>>>>>> Thanks a lot
>>>>>>>
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Welly Tambunan
>>>>>>> Triplelands
>>>>>>>
>>>>>>> http://weltam.wordpress.com
>>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Welly Tambunan
>>>>> Triplelands
>>>>>
>>>>> http://weltam.wordpress.com
>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>

Mime
View raw message