From user-return-17902-archive-asf-public=cust-asf.ponee.io@flink.apache.org Wed Jan 31 07:05:04 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id AB6E8180662 for ; Wed, 31 Jan 2018 07:05:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 9B6C0160C54; Wed, 31 Jan 2018 06:05:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EC51D160C53 for ; Wed, 31 Jan 2018 07:05:03 +0100 (CET) Received: (qmail 64060 invoked by uid 500); 31 Jan 2018 06:05:02 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 64050 invoked by uid 99); 31 Jan 2018 06:05:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jan 2018 06:05:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id F29BED75EB for ; Wed, 31 Jan 2018 06:05:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -16.331 X-Spam-Level: X-Spam-Status: No, score=-16.331 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=citi.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id hKV9qAxsFOke for ; Wed, 31 Jan 2018 06:05:01 +0000 (UTC) Received: from mx0a-00123c01.pphosted.com (mx-a.mail.citi.com [67.231.145.106]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 3AA0E5FB2F for ; Wed, 31 Jan 2018 06:05:00 +0000 (UTC) Received: from pps.filterd (m0030125.ppops.net [127.0.0.1]) by mx0a-00123c02.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id w0V61crm019479; Wed, 31 Jan 2018 06:04:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=citi.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=maila; bh=rmt7TjdJTsiejmvkRg9DwETln8j4hukgeYCnomqk5a4=; b=c/168wfjCtiPqzehFVVPk6sX76lxr/LYCx/Rx2AjrSHQZ9BVxEA/JxW/Piftf/Cxzc+E LOIGYuqXvy+n36ccWBrtny7Qu44XUCzMYWdc1/dpVHCANpBNnpQKPo71ALRcvomKU2dO TiNCz6Lxeg+BlgTSN7Xtf84SJDXG4Y+uggGJkJ90bD9wmDyNKwUQajDnyXVJGpHj7Az1 4i4NDBwoFF9VZuJwqs1VsGaGcO0Q6/BAh8AgpVpMfyOsd2p6RQNi1bf/S8Kw3T5+1h1L EzrLgdw/ZzXPpbF44UUg5Xic16mqEyZld3x4mgCqBCDEgaltiA4JtjmiMp1qEOpvsVBa AQ== Received: from mail.citigroup.com (smtpoutbound.citigroup.com [192.193.193.15]) by mx0a-00123c02.pphosted.com with ESMTP id 2frg0p41d1-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT); Wed, 31 Jan 2018 06:04:51 +0000 Received: from imbhub-gt35.nam.nsroot.net (imbhub-gt35.nam.nsroot.net [169.171.85.124]) by smtpinbound.citigroup.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.2.2) with ESMTP id w0V64oJI005283; Wed, 31 Jan 2018 06:04:50 GMT Received: from imbdlprt-gt05.nam.nsroot.net (imbdlprt-gt05.nam.nsroot.net [153.40.239.194]) by imbhub-gt35.nam.nsroot.net (Sentrion-MTA-4.3.1/Sentrion-MTA-4.2.2) with ESMTP id w0V64nuk009824; Wed, 31 Jan 2018 06:04:50 GMT Received: from imbdlpbuf-gt02.nam.nsroot.net (namdlpdimpmw14.nam.nsroot.net [144.215.120.93]) by imbdlprt-gt05.nam.nsroot.net (Sentrion-MTA-4.3.1/Sentrion-MTA-4.2.2) with ESMTP id w0V64nJG021364; Wed, 31 Jan 2018 06:04:49 GMT Received: from EXLNIHT10.eur.nsroot.net (EXLNIHT10.eur.nsroot.net [169.182.85.60]) by imbdlpbuf-gt02.nam.nsroot.net (Sentrion-MTA-4.3.1/Sentrion-MTA-4.2.2) with ESMTP id w0V64O9n027241; Wed, 31 Jan 2018 06:04:48 GMT Received: from EXLNHT09.eur.nsroot.net (169.182.86.179) by EXLNIHT10.eur.nsroot.net (169.182.85.60) with Microsoft SMTP Server (TLS) id 14.3.361.1; Wed, 31 Jan 2018 06:04:44 +0000 Received: from EXLNMB58.eur.nsroot.net ([169.254.2.177]) by exlnht09.eur.nsroot.net ([169.182.86.179]) with mapi id 14.03.0361.001; Wed, 31 Jan 2018 06:04:44 +0000 From: "Marchant, Hayden " To: Stefan Richter CC: "user@flink.apache.org" , Aljoscha Krettek Subject: RE: Joining data in Streaming Thread-Topic: Joining data in Streaming Thread-Index: AdOZqP0FrX0/Pk2RRL24Na2Vy5XF/AALB28AACDyZOA= Date: Wed, 31 Jan 2018 06:04:43 +0000 Message-ID: <99B8991EBC5DF841B18C71E56C02D4A010D56678@EXLNMB58.eur.nsroot.net> References: <99B8991EBC5DF841B18C71E56C02D4A010D5616C@EXLNMB58.eur.nsroot.net> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [169.182.87.212] x-wiganss: 01000000010017exlnht09.eur.nsroot.net ID0042<99B8991EBC5DF841B18C71E56C02D4A010D56678@EXLNMB58.eur.nsroot.net> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-31_01:,, signatures=0 Stefan, So are we essentially saying that in this case, for now, I should stick to = DataSet / Batch Table API? Thanks, Hayden -----Original Message----- From: Stefan Richter [mailto:s.richter@data-artisans.com]=20 Sent: Tuesday, January 30, 2018 4:18 PM To: Marchant, Hayden [ICG-IT] Cc: user@flink.apache.org; Aljoscha Krettek Subject: Re: Joining data in Streaming Hi, as far as I know, this is not easily possible. What would be required is so= mething like a CoFlatmap function, where one input stream is blocking until= the second stream is fully consumed to build up the state to join against.= Maybe Aljoscha (in CC) can comment on future plans to support this. Best, Stefan > Am 30.01.2018 um 12:42 schrieb Marchant, Hayden : >=20 > We have a use case where we have 2 data sets - One reasonable large data = set (a few million entities), and a smaller set of data. We want to do a jo= in between these data sets. We will be doing this join after both data sets= are available. In the world of batch processing, this is pretty straightf= orward - we'd load both data sets into an application and execute a join op= erator on them through a common key. Is it possible to do such a join usi= ng the DataStream API? I would assume that I'd use the connect operator, th= ough I'm not sure exactly how I should do the join - do I need one 'smaller= ' set to be completely loaded into state before I start flowing the large s= et? My concern is that if I read both data sets from streaming sources, sin= ce I can't be guaranteed of the order that the data is loaded, I may lose l= ots of potential joined entities since their pairs might not have been read= yet.=20 >=20 >=20 > Thanks, > Hayden Marchant >=20 >=20