flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Qin <qinnc...@gmail.com>
Subject Re: multi tenant workflow execution
Date Wed, 25 Jan 2017 07:18:41 GMT
Hi Fabian,

AsyncFunction and ProcessFunction do help!

I assume per event timers I created in implement RichProcessFunction will
be part of key grouped states & cached in memory during runtime right? I am
interested in this because we are targeting large deployment of million TPS
event source. I would like to understand checkpoint size and speed

How about checkpointing iteration stream? Can we achieve at least once
semantic in 1.2 on integration jobs?


On Tue, Jan 24, 2017 at 2:26 AM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Chen,
> if you plan to implement your application on top of the upcoming Flink
> 1.2.0 release, you might find the new AsyncFunction [1] and the
> ProcessFunction [2] helpful.
> AsyncFunction can be used for non-blocking calls to external services and
> maintains the checkpointing semantics.
> ProcessFunction allows to register and react to timers. This might easier
> to use than a window for the 24h timeout.
> Best,
> Fabian
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.2/dev/stream/asyncio.html
> [2] https://ci.apache.org/projects/flink/flink-docs-
> release-1.2/dev/stream/process_function.html
> 2017-01-24 0:41 GMT+01:00 Chen Qin <qinnchen@gmail.com>:
>> Hi there,
>> I am researching running one flink job to support customized event driven
>> workflow executions. The use case is to support running various workflows
>> that listen to a set of kafka topics and performing various rpc checks, a
>> user travel through multiple stages in a rule execution(workflow
>> execution). e.g
>> kafka topic : user click stream
>> rpc checks:
>> if user is member,
>> if user has shown interest of signup
>> ​workflows:
>> ​
>> workflow 1: user click -> if user is member do A then do B
>> workflow 2: user click -> if user has shown interest of signup then do A
>> otherwise wait for 60 mins and try recheck, expire in 24 hours
>> The goal is as I said to run workflow1 & workflow2 in one flink job.
>> Initial thinking describes below
>> sources are series of kafka topics, all events go through coMap,cache
>> lookup event -> rules mapping and fan out to multiple {rule, user} tuple.
>> Based on rule definition and stage user is in a given rule, it do series of
>> async rpc check and side outputs to various of sinks.
>>    - If a {rule, user} tuple needs to stay in a operator states longer
>>    (1 day), there should be a window following async rpc checks with
>>    customized purgetrigger firing those passes and purge either pass check or
>>    expired tuples.
>>    - If a {rule, user} execute to a stage which waits for a kafka event,
>>    it should be added to cache and hookup with coMap lookups near sources
>>  Does that makes sense?
>> Thanks,
>> Chen

View raw message