flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Salva Alcántara <salcantara...@gmail.com>
Subject Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter
Date Sun, 05 Apr 2020 13:00:02 GMT
Hi again Piotr,

I have further considered the mailbox executor approach and I think it will
not be enough for my purposes. Here is why:

- My state consists of models created with a third party library
- These models have their own state, which means that when I forward events
in `ProcessElement1` to these models, the model's state will be updated
accordingly.

So, what would happen if:

- A new element E is processed in `ProcessElement1` and sent to the third
party library model
- A checkpoint is taken, in particular snapshotting all the library models
in use
- The element E that was sent to the library is expected to generate an
output O result when the callback is called, but a failure happens before
that
- Application recovers from the snapshot and continue processing elements,
but the callback generating the expected output O has been lost by now, so
that output will be lost

By considering the above case, I realize that the only option for me might
be to rely on AsyncIO. However, this is far from ideal because I am not
expecting an output result for each element I send to my models. I could use
a timeout but that may slow down processing as asyncIO has a limited queue
of "active" elements. Also, most of the times, I am not expecting a result
back at all from my models (callbacks will be invoked only a few times since
my modes are detecting anomalies).

In your opinion, what would be the best approach for handling this use case?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Mime
View raw message