flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Current alternatives for async I/O
Date Tue, 11 Oct 2016 07:35:06 GMT
Hi Ken,

I think your solution should work.
You need to make sure though, that you properly manage the state of your
function, i.e., memorize all records which have been received but haven't
be emitted yet.
Otherwise records might get lost in case of a failure.

Alternatively, you can implement this as a custom operators. This would
give you full access but you would need to take care of organizing
checkpoints and other low-level issues yourself. This would also be
basically the same as implementing FLIP-12 (or a subset of it).

Best, Fabian


2016-10-09 3:31 GMT+02:00 Ken Krugler <kkrugler_lists@transpac.com>:

> Hi all,
>
> I’ve been watching the FLIP-12
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673> design
> discussion, and it looks like a promising solution for the issues we’ve got
> with needing to make asynchronous multi-threaded requests in a Flink
> operator.
>
> What’s the best workaround with current releases of Flink?
>
> One option is to have a special tickler source that broadcasts a Tuple0
> every X milliseconds, which gets connected to the real stream that feeds a
> CoFlatMap. Inside of this I’ve got queues for incoming and generated
> tuples, with a thread pool to pull from the incoming and write to the
> generated queues. When I get one of the “tickle” Tuple0s, I emit all of the
> generated tuples.
>
> There are issues with needing to bound the size of the queues, and all of
> the usual fun with thread pools, but it seems to work.
>
> Is there a better/simpler approach?
>
> Thanks,
>
> — Ken
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>

Mime
View raw message