flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oytun Tez <oy...@motaword.com>
Subject Re: Calculating over multiple streams...
Date Fri, 22 Feb 2019 16:39:14 GMT
Restructuring with your tip now, Michael, thank you!

---
Oytun Tez

*M O T A W O R D*
The World's Fastest Human Translation Platform.
oytun@motaword.com — www.motaword.com


On Fri, Feb 22, 2019 at 11:23 AM Michael Latta <mlatta@technomage.com>
wrote:

> You may want to union the 3 streams prior to the process function if they
> are independently processed.
>
>
> Michael
>
> On Feb 22, 2019, at 9:15 AM, Oytun Tez <oytun@motaword.com> wrote:
>
> Hi everyone!
>
> I've been struggling with an implementation problem in the last days,
> which I am almost sure caused by my misunderstanding of Flink.
>
> The purpose: consume multiple streams, update a score list (with meta data
> e.g. user_id) for each update coming from any of the streams. The new
> output list will also need to be used by another pattern.
>
>    1. We created 3 SourceFunctions, that periodically go to our MySQL
>    database and stream new results back. This one returns POJOs.
>    2. Then we flatMap these streams to unify their Type. They are now all
>    Tuple3s with matching types.
>    3. And we process each stream with the same ProcessFunction.
>    4. I am stuck with the output list.
>
> Business case (human translation workflow):
>
>    1. Input: Stream "translation quality" score updates of each
>    translator [translator_id, score]
>    2. Input: Stream "responsivity score" updates of each translator
>    (email open rates/speeds etc) [translator_id, score]
>    3. Input: Stream "number of projects" updates each translator worked
>    on [translator_id, score]
>    4. Calculation: for each translator, use 3 scores to come up with a
>    unified score and its percentile over all translators. This step definitely
>    feels like a Batch job, but I am pushing to go with a streaming mindset.
>    5. So now supposedly, in this way or another, I have a list of
>    translators with their unified score and percentile over this list.
>    6. Another independent stream should send me updates on "need for
>    proofreaders" – I couldn't even come to this point yet. Once a need info is
>    streamed, application would fetch the previously calculated list and let's
>    say picks the top X determined by the message from need algorithm.
>
>
> <image.png>
>
> Overall, my desire is to make everything a stream and let the data and
> decisions constantly react to stream updates. I am very confused at this
> point. Tried using keyed and operator states, but they seem to be keeping
> their state only for their own items. Considering to do Batch instead after
> all the struggle.
>
> Any ideas? I can even get on a call.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---
> Oytun Tez
>
> *M O T A W O R D*
> The World's Fastest Human Translation Platform.
> oytun@motaword.com — www.motaword.com
>
>

Mime
View raw message