flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Latta <mla...@technomage.com>
Subject Re: Calculating over multiple streams...
Date Fri, 22 Feb 2019 16:23:45 GMT
You may want to union the 3 streams prior to the process function if they are independently
processed. 


Michael

> On Feb 22, 2019, at 9:15 AM, Oytun Tez <oytun@motaword.com> wrote:
> 
> Hi everyone!
> 
> I've been struggling with an implementation problem in the last days, which I am almost
sure caused by my misunderstanding of Flink. 
> 
> The purpose: consume multiple streams, update a score list (with meta data e.g. user_id)
for each update coming from any of the streams. The new output list will also need to be used
by another pattern.
> We created 3 SourceFunctions, that periodically go to our MySQL database and stream new
results back. This one returns POJOs.
> Then we flatMap these streams to unify their Type. They are now all Tuple3s with matching
types.
> And we process each stream with the same ProcessFunction.
> I am stuck with the output list.
> Business case (human translation workflow):
> Input: Stream "translation quality" score updates of each translator [translator_id,
score]
> Input: Stream "responsivity score" updates of each translator (email open rates/speeds
etc) [translator_id, score]
> Input: Stream "number of projects" updates each translator worked on [translator_id,
score]
> Calculation: for each translator, use 3 scores to come up with a unified score and its
percentile over all translators. This step definitely feels like a Batch job, but I am pushing
to go with a streaming mindset.
> So now supposedly, in this way or another, I have a list of translators with their unified
score and percentile over this list.
> Another independent stream should send me updates on "need for proofreaders" – I couldn't
even come to this point yet. Once a need info is streamed, application would fetch the previously
calculated list and let's say picks the top X determined by the message from need algorithm.
> 
> <image.png>
> 
> Overall, my desire is to make everything a stream and let the data and decisions constantly
react to stream updates. I am very confused at this point. Tried using keyed and operator
states, but they seem to be keeping their state only for their own items. Considering to do
Batch instead after all the struggle.
> 
> Any ideas? I can even get on a call.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ---
> Oytun Tez
> 
> M O T A W O R D
> The World's Fastest Human Translation Platform.
> oytun@motaword.com — www.motaword.com

Mime
View raw message