flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: Keep Model in Operator instance up to date
Date Thu, 20 Aug 2015 10:33:54 GMT
Hi,

I don't think I fully understand your question, could you please try to be
a little more specific?

I assume by caching you mean that you keep the current model as an operator
state. Why would you need to add new streams in this case?

I might be slow to answer as I am currently on vacation without stable
internet connection.

Cheers,
Gyula
On Thu, Aug 20, 2015 at 5:36 AM Welly Tambunan <if05041@gmail.com> wrote:

> Hi Gyula,
>
> I have another question. So if i cache something on the operator, to keep
> it up to date,  i will always need to add and connect another stream of
> changes to the operator ?
>
> Is this right for every case ?
>
> Cheers
>
> On Wed, Aug 19, 2015 at 3:21 PM, Welly Tambunan <if05041@gmail.com> wrote:
>
>> Hi Gyula,
>>
>> That's really helpful. The docs is improving so much since the last time
>> (0.9).
>>
>> Thanks a lot !
>>
>> Cheers
>>
>> On Wed, Aug 19, 2015 at 3:07 PM, Gyula Fóra <gyula.fora@gmail.com> wrote:
>>
>>> Hey,
>>>
>>> If it is always better to check the events against a more up-to-date
>>> model (even if the events we are checking arrived before the update) then
>>> it is fine to keep the model outside of the system.
>>>
>>> In this case we need to make sure that we can push the updates to the
>>> external system consistently. If you are using the PersistentKafkaSource
>>> for instance it can happen that some messages are replayed in case of
>>> failure. In this case you need to make sure that you remove duplicate
>>> updates or have idempotent updates.
>>>
>>> You can read about the checkpoint mechanism in the Flink website:
>>> https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
>>>
>>> Cheers,
>>> Gyula
>>>
>>> On Wed, Aug 19, 2015 at 9:56 AM Welly Tambunan <if05041@gmail.com>
>>> wrote:
>>>
>>>> Thanks Gyula,
>>>>
>>>> Another question i have..
>>>>
>>>> > ... while external model updates would be *tricky *to keep
>>>> consistent.
>>>> Is that still the case if the Operator treat the external model as
>>>> read-only ? We create another stream that will update the external model
>>>> separately.
>>>>
>>>> Could you please elaborate more about this one ?
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra <gyula.fora@gmail.com>
>>>> wrote:
>>>>
>>>>> In that case I would apply a map to wrap in some common type, like a
n
>>>>> Either<t1,t2> before the union.
>>>>>
>>>>> And then in the coflatmap you can unwrap it.
>>>>> On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan <if05041@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Gyula,
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> However update1 and update2 have a different type. Based on my
>>>>>> understanding, i don't think we can use union. How can we handle
this one ?
>>>>>>
>>>>>> We like to create our event strongly type to get the domain language
>>>>>> captured.
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra <gyula.fora@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> One input of your co-flatmap would be model updates and the other
>>>>>>> input would be events to check against the model if I understand
correctly.
>>>>>>>
>>>>>>> This means that if your model updates come from more than one
stream
>>>>>>> you need to union them into a single stream before connecting
them with the
>>>>>>> event stream and applying the coatmap.
>>>>>>>
>>>>>>> DataStream updates1 = ....
>>>>>>> DataStream updates2 = ....
>>>>>>> DataStream events = ....
>>>>>>>
>>>>>>> events.connect(updates1.union(updates2).broadcast()).flatMap(...)
>>>>>>>
>>>>>>> Does this answer your question?
>>>>>>>
>>>>>>> Gyula
>>>>>>>
>>>>>>>
>>>>>>> On Wednesday, August 19, 2015, Welly Tambunan <if05041@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Gyula,
>>>>>>>>
>>>>>>>> Thanks for your response.
>>>>>>>>
>>>>>>>> However the model can received multiple event for update.
How can
>>>>>>>> we do that with co-flatmap as i can see the connect API only
received
>>>>>>>> single datastream ?
>>>>>>>>
>>>>>>>>
>>>>>>>> > ... while external model updates would be tricky to
keep
>>>>>>>> consistent.
>>>>>>>> Is that still the case if the Operator treat the external
model as
>>>>>>>> read-only ? We create another stream that will update the
external model
>>>>>>>> separately.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra <gyfora@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hey!
>>>>>>>>>
>>>>>>>>> I think it is safe to say that the best approach in this
case is
>>>>>>>>> creating a co-flatmap that will receive updates on one
input. The events
>>>>>>>>> should probably be broadcasted in this case so you can
check in parallel.
>>>>>>>>>
>>>>>>>>> This approach can be used effectively with Flink's checkpoint
>>>>>>>>> mechanism, while external model updates would be tricky
to keep consistent.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Gyula
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan <if05041@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> We have a streaming computation that required to
validate the
>>>>>>>>>> data stream against the model provided by the user.
>>>>>>>>>>
>>>>>>>>>> Right now what I have done is to load the model into
flink
>>>>>>>>>> operator and then validate against it. However the
model can be updated and
>>>>>>>>>> changed frequently. Fortunately we always publish
this event to RabbitMQ.
>>>>>>>>>>
>>>>>>>>>> I think we can
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    1. Create RabbitMq listener for model changed
event from
>>>>>>>>>>    inside the operator, then update the model if
event arrived.
>>>>>>>>>>
>>>>>>>>>>    But i think this will create race condition if
not handle
>>>>>>>>>>    correctly and it seems odd to keep this
>>>>>>>>>>
>>>>>>>>>>    2. We can move the model into external in external
memory
>>>>>>>>>>    cache storage and keep the model up to date using
flink. So the operator
>>>>>>>>>>    will retrieve that from memory cache
>>>>>>>>>>
>>>>>>>>>>    3. Create two stream and using co operator for
managing the
>>>>>>>>>>    shared state.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What is your suggestion on keeping the state up to
date from
>>>>>>>>>> external event ? Is there some kind of best practice
for maintaining model
>>>>>>>>>> up to date on streaming operator ?
>>>>>>>>>>
>>>>>>>>>> Thanks a lot
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Welly Tambunan
>>>>>>>>>> Triplelands
>>>>>>>>>>
>>>>>>>>>> http://weltam.wordpress.com
>>>>>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Welly Tambunan
>>>>>>>> Triplelands
>>>>>>>>
>>>>>>>> http://weltam.wordpress.com
>>>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Welly Tambunan
>>>>>> Triplelands
>>>>>>
>>>>>> http://weltam.wordpress.com
>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>

Mime
View raw message