flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Wei <tony19920...@gmail.com>
Subject Re: Broadcast to all the other operators
Date Thu, 09 Nov 2017 10:09:37 GMT
Hi Sadok,

The sample code is just an example to show you how to broadcast the rules
to all subtasks, but the output from CoFlatMap is not necessary to be
Tuple2<Rule, Record>. It depends on what you actually need in your Rule
Engine project.
For example, if you can apply rule on each record directly, you can emit
processed records to keyed operator.
IMHO, the scenario in the article you mentioned is having serval
well-prepared rules to enrich data, and using DSL files to decide what
rules that incoming event needs. After enriching, the features for the
particular event will be grouped by its random id and be calculated by the
models.
I think this approach might be close to the solution in that article, but
it could have some difference according to different use cases.

Best Regards,
Tony Wei


2017-11-09 17:27 GMT+08:00 Ladhari Sadok <laadhari.sadok@gmail.com>:

>
> ---------- Forwarded message ----------
> From: Ladhari Sadok <laadhari.sadok@gmail.com>
> Date: 2017-11-09 10:26 GMT+01:00
> Subject: Re: Broadcast to all the other operators
> To: Tony Wei <tony19920430@gmail.com>
>
>
> Thanks Tony for your very fast answer ,
>
> Yes it resolves my problem that way, but with flatMap I will get
> Tuple2<Rule, Record> always in the processing function (<NULL ,Record> in
> case of no rules update available and <newRule,Record> in the other case ).
> There is no optimization of this solution ? Do you think it is the same
> solution in this picture : https://data-artisans.com/wp-c
> ontent/uploads/2017/10/streaming-in-definitions.png ?
>
> Best regards,
> Sadok
>
>
> Le 9 nov. 2017 9:21 AM, "Tony Wei" <tony19920430@gmail.com> a écrit :
>
> Hi Sadok,
>
> What I mean is to keep the rules in the operator state. The event in Rule
> Stream is just the change log about rules.
> For more specific, you can fetch the rules from Redis in the open step of
> CoFlatMap and keep them in the operator state, then use Rule Stream to
> notify the CoFlatMap to 1. update some rules or 2. refetch all rules from
> Redis.
> Is that what you want?
>
> Best Regards,
> Tony Wei
>
> 2017-11-09 15:52 GMT+08:00 Ladhari Sadok <laadhari.sadok@gmail.com>:
>
>> Thank you for the answer, I know that solution, but I don't want to
>> stream the rules all time.
>> In my case I have the rules in Redis and at startup of flink they are
>> loaded.
>>
>> I want to broadcast changes just when it occurs.
>>
>> Thanks.
>>
>> Le 9 nov. 2017 7:51 AM, "Tony Wei" <tony19920430@gmail.com> a écrit :
>>
>>> Hi Sadok,
>>>
>>> Since you want to broadcast Rule Stream to all subtasks, it seems that
>>> it is not necessary to use KeyedStream.
>>> How about use broadcast partitioner, connect two streams to attach the
>>> rule on each record or imply rule on them directly, and do the key operator
>>> after that?
>>> If you need to do key operator and apply the rules, it should work by
>>> changing the order.
>>>
>>> The code might be something like this, and you can change the rules'
>>> state in the CoFlatMapFunction.
>>>
>>> DataStream<Rule> rules = ...;
>>> DataStream<Record> records = ...;
>>> DataStream<Tuple2<Rule, Record>> recordWithRule =
>>> rules.broadcast().connect(records).flatMap(...);
>>> dataWithRule.keyBy(...).process(...);
>>>
>>> Hope this will make sense to you.
>>>
>>> Best Regards,
>>> Tony Wei
>>>
>>> 2017-11-09 6:25 GMT+08:00 Ladhari Sadok <laadhari.sadok@gmail.com>:
>>>
>>>> Hello,
>>>>
>>>> I'm working on Rules Engine project with Flink 1.3, in this project I
>>>> want to update some keyed operator state when external event occurred.
>>>>
>>>> I have a Datastream of updates (from kafka) I want to broadcast the
>>>> data contained in this stream to all keyed operator so I can change the
>>>> state in all operators.
>>>>
>>>> It is like this use case :
>>>> Image : https://data-artisans.com/wp-content/uploads/2017/10/streami
>>>> ng-in-definitions.png
>>>> All article : https://data-artisans.com/blog
>>>> /real-time-fraud-detection-ing-bank-apache-flink
>>>>
>>>> I founded it in the DataSet API but not in the DataStream API !
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>>>> dev/batch/index.html#broadcast-variables
>>>>
>>>> Can some one explain to me who to solve this problem ?
>>>>
>>>> Thanks a lot.
>>>>
>>>> Flinkly regards,
>>>> Sadok
>>>>
>>>
>>>
>
>
>

Mime
View raw message