From user-return-26111-archive-asf-public=cust-asf.ponee.io@flink.apache.org Fri Feb 22 16:34:12 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3B6B4180648 for ; Fri, 22 Feb 2019 17:34:12 +0100 (CET) Received: (qmail 19682 invoked by uid 500); 22 Feb 2019 16:34:11 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 19672 invoked by uid 99); 22 Feb 2019 16:34:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2019 16:34:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 8EE45C27EF for ; Fri, 22 Feb 2019 16:34:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.002 X-Spam-Level: *** X-Spam-Status: No, score=3.002 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id FcYOu0ifILhb for ; Fri, 22 Feb 2019 16:34:09 +0000 (UTC) Received: from smtp89.ord1c.emailsrvr.com (smtp89.ord1c.emailsrvr.com [108.166.43.89]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D2C0662484 for ; Fri, 22 Feb 2019 16:23:54 +0000 (UTC) Received: from smtp20.relay.ord1c.emailsrvr.com (localhost [127.0.0.1]) by smtp20.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id D01AEE017A; Fri, 22 Feb 2019 11:23:47 -0500 (EST) X-Auth-ID: mlatta@technomage.com Received: by smtp20.relay.ord1c.emailsrvr.com (Authenticated sender: mlatta-AT-technomage.com) with ESMTPSA id 3A954E00FD; Fri, 22 Feb 2019 11:23:47 -0500 (EST) X-Sender-Id: mlatta@technomage.com Received: from [192.168.10.115] (96-90-188-46-static.hfc.comcastbusiness.net [96.90.188.46]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:587 (trex/5.7.12); Fri, 22 Feb 2019 11:23:47 -0500 Content-Type: multipart/alternative; boundary=Apple-Mail-9B2C3E9A-32D0-417B-97FF-4DBCB29003F0 Mime-Version: 1.0 (1.0) Subject: Re: Calculating over multiple streams... From: Michael Latta X-Mailer: iPhone Mail (16D57) In-Reply-To: Date: Fri, 22 Feb 2019 09:23:45 -0700 Cc: user@flink.apache.org Content-Transfer-Encoding: 7bit Message-Id: <45F4C230-86DA-44C7-A438-9C49256EA67A@technomage.com> References: To: Oytun Tez --Apple-Mail-9B2C3E9A-32D0-417B-97FF-4DBCB29003F0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable You may want to union the 3 streams prior to the process function if they ar= e independently processed.=20 Michael > On Feb 22, 2019, at 9:15 AM, Oytun Tez wrote: >=20 > Hi everyone! >=20 > I've been struggling with an implementation problem in the last days, whic= h I am almost sure caused by my misunderstanding of Flink.=20 >=20 > The purpose: consume multiple streams, update a score list (with meta data= e.g. user_id) for each update coming from any of the streams. The new outpu= t list will also need to be used by another pattern. > We created 3 SourceFunctions, that periodically go to our MySQL database a= nd stream new results back. This one returns POJOs. > Then we flatMap these streams to unify their Type. They are now all Tuple3= s with matching types. > And we process each stream with the same ProcessFunction. > I am stuck with the output list. > Business case (human translation workflow): > Input: Stream "translation quality" score updates of each translator [tran= slator_id, score] > Input: Stream "responsivity score" updates of each translator (email open r= ates/speeds etc) [translator_id, score] > Input: Stream "number of projects" updates each translator worked on [tran= slator_id, score] > Calculation: for each translator, use 3 scores to come up with a unified s= core and its percentile over all translators. This step definitely feels lik= e a Batch job, but I am pushing to go with a streaming mindset. > So now supposedly, in this way or another, I have a list of translators wi= th their unified score and percentile over this list. > Another independent stream should send me updates on "need for proofreader= s" =E2=80=93 I couldn't even come to this point yet. Once a need info is str= eamed, application would fetch the previously calculated list and let's say p= icks the top X determined by the message from need algorithm. >=20 > >=20 > Overall, my desire is to make everything a stream and let the data and dec= isions constantly react to stream updates. I am very confused at this point.= Tried using keyed and operator states, but they seem to be keeping their st= ate only for their own items. Considering to do Batch instead after all the s= truggle. >=20 > Any ideas? I can even get on a call. >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 > --- > Oytun Tez >=20 > M O T A W O R D > The World's Fastest Human Translation Platform. > oytun@motaword.com =E2=80=94 www.motaword.com --Apple-Mail-9B2C3E9A-32D0-417B-97FF-4DBCB29003F0 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable You may want to union the 3 streams prior t= o the process function if they are independently processed. 


Michael

On Feb 22, 2019, at 9:15 AM, Oytun Tez <oytun@motaword.com> wrote:

Hi everyone!

=
I've been struggling with an implementation problem in the last days, w= hich I am almost sure caused by my misunderstanding of Flink. 

The purpose: consume multiple streams, update a score list (= with meta data e.g. user_id) for each update coming from any of the streams.= The new output list will also need to be used by another pattern.
  1. We created 3 SourceFunctions, that periodically go to our MySQL dat= abase and stream new results back. This one returns POJOs.
  2. Then we f= latMap these streams to unify their Type. They are now all Tuple3s with matc= hing types.
  3. And we process each stream with the same ProcessFunction= .
  4. I am stuck with the output list.
Business case (huma= n translation workflow):
  1. Input: Stream "translation q= uality" score updates of each translator [translator_id, score]
  2. Inpu= t: Stream "responsivity score" updates of each translator (email open rates/= speeds etc) [translator_id, score]
  3. Input: Stream "number of projects= " updates each translator worked on [translator_id, score]
  4. Calculati= on: for each translator, use 3 scores to come up with a unified score and it= s percentile over all translators. This step definitely feels like a Batch j= ob, but I am pushing to go with a streaming mindset.
  5. So now supposed= ly, in this way or another, I have a list of translators with their unified s= core and percentile over this list.
  6. Another independent stream shoul= d send me updates on "need for proofreaders" =E2=80=93 I couldn't even come t= o this point yet. Once a need info is streamed, application would fetch the p= reviously calculated list and let's say picks the top X determined by the me= ssage from need algorithm.

<image= .png>

Overall, my desire is to make ev= erything a stream and let the data and decisions constantly react to stream u= pdates. I am very confused at this point. Tried using keyed and operator sta= tes, but they seem to be keeping their state only for their own items. Consi= dering to do Batch instead after all the struggle.

= Any ideas? I can even get on a call.







=




<= br clear=3D"all">

---
Oytun Tez
M O T A W O R D=
<= div>
The World's Fastest H= uman Translation Platform.
oytun<= font color=3D"#1155cc">@motaword.com =E2=80=94 <= span style=3D"font-size:12px">www.motaword.com
= --Apple-Mail-9B2C3E9A-32D0-417B-97FF-4DBCB29003F0--