flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oytun Tez <oy...@motaword.com>
Subject Calculating over multiple streams...
Date Fri, 22 Feb 2019 16:15:36 GMT
Hi everyone!

I've been struggling with an implementation problem in the last days, which
I am almost sure caused by my misunderstanding of Flink.

The purpose: consume multiple streams, update a score list (with meta data
e.g. user_id) for each update coming from any of the streams. The new
output list will also need to be used by another pattern.

   1. We created 3 SourceFunctions, that periodically go to our MySQL
   database and stream new results back. This one returns POJOs.
   2. Then we flatMap these streams to unify their Type. They are now all
   Tuple3s with matching types.
   3. And we process each stream with the same ProcessFunction.
   4. I am stuck with the output list.

Business case (human translation workflow):

   1. Input: Stream "translation quality" score updates of each translator
   [translator_id, score]
   2. Input: Stream "responsivity score" updates of each translator (email
   open rates/speeds etc) [translator_id, score]
   3. Input: Stream "number of projects" updates each translator worked on
   [translator_id, score]
   4. Calculation: for each translator, use 3 scores to come up with a
   unified score and its percentile over all translators. This step definitely
   feels like a Batch job, but I am pushing to go with a streaming mindset.
   5. So now supposedly, in this way or another, I have a list of
   translators with their unified score and percentile over this list.
   6. Another independent stream should send me updates on "need for
   proofreaders" – I couldn't even come to this point yet. Once a need info is
   streamed, application would fetch the previously calculated list and let's
   say picks the top X determined by the message from need algorithm.


[image: image.png]

Overall, my desire is to make everything a stream and let the data and
decisions constantly react to stream updates. I am very confused at this
point. Tried using keyed and operator states, but they seem to be keeping
their state only for their own items. Considering to do Batch instead after
all the struggle.

Any ideas? I can even get on a call.














---
Oytun Tez

*M O T A W O R D*
The World's Fastest Human Translation Platform.
oytun@motaword.com — www.motaword.com

Mime
View raw message