Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DACA4200BE7 for ; Tue, 20 Dec 2016 10:50:20 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D9755160B1B; Tue, 20 Dec 2016 09:50:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2DADD160B29 for ; Tue, 20 Dec 2016 10:50:20 +0100 (CET) Received: (qmail 72051 invoked by uid 500); 20 Dec 2016 09:50:19 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 72041 invoked by uid 99); 20 Dec 2016 09:50:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2016 09:50:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id AC95DCA720 for ; Tue, 20 Dec 2016 09:50:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id RvAP0FxpoOMz for ; Tue, 20 Dec 2016 09:50:13 +0000 (UTC) Received: from mail-wm0-f45.google.com (mail-wm0-f45.google.com [74.125.82.45]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 825165FC87 for ; Tue, 20 Dec 2016 09:50:12 +0000 (UTC) Received: by mail-wm0-f45.google.com with SMTP id t79so124891899wmt.0 for ; Tue, 20 Dec 2016 01:50:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=LKQDkLbcxr363Yxd6NJDCBCSlq6VKakps8Ljn4KcO+g=; b=XibHKC/GuL904IsSibR021Y6cU8zbcQWpUEOLiPAgCpiwfEIU2llEDWa0ZZlBIEV0g MHngabTpq3eom2jjdRbOPjA7H2P6csMhRp4N8Cv7Ixn/qbDb+wQ+GXxD5RPP1DPiKicw PA5Z+2uEHSu3UwJotvGqt0WBIJXZxiNY8WeOxuNcSet0rxYbOfRxqdejB+VyZAiUviJR d8ZGmbCrfzMpki2kVoMq1QTeIL0WUKaTq2XTtWrF8GOdr3ydDosAg1+5Za+Ykb/aIJkY LHlyNvXvVBuKC7RNWRO0R7sLMG/o5zSuktUV2flrsxN6pDaKzZbGb+gf5wJJ+87NgQe0 qn2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=LKQDkLbcxr363Yxd6NJDCBCSlq6VKakps8Ljn4KcO+g=; b=mumH1HIOeaL37VD7Q6Dwez5ruidhZoevCMKYw++xVhDmQgqA5LEluZw1aH7XIc1Mem LTSHvhSbR4ed63H2DPCVJ5EO8lfAvRdINcFMnvQeRJgHj6YapY6st5A9vr4zGSQ+DSmS dy+IosRJoR7t+rdbcNN6njPLzuaZ/t3A81JwPIQG2hRwlFLLYCzFnwzV/zof+4Ne0Nc5 7gP+ehPnVJbsurJBVHQ3MShnW1gKIxyRym0BpFulBsz5OErGrNAc4EScCTil+YaxQ7Wq 7USoDWHQOK0v7L4+zDiJkREMPgz9Gi1iCxB2EJr7CnUYgE8UtDrDEuUAM9AX5+kFMd6L S95A== X-Gm-Message-State: AIkVDXIl1oq277Ns4MoUCL85+a8XYOVhm9Q48cNtEREEzNGGlBh4dujGcMLjJ2Ap7dsEQzoIGC2hyix0+LLKIQ== X-Received: by 10.28.69.17 with SMTP id s17mr1110830wma.141.1482227411372; Tue, 20 Dec 2016 01:50:11 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.51.104 with HTTP; Tue, 20 Dec 2016 01:50:10 -0800 (PST) From: Yury Ruchin Date: Tue, 20 Dec 2016 12:50:10 +0300 Message-ID: Subject: Implementing tee functionality in a streaming job To: user@flink.apache.org Content-Type: multipart/alternative; boundary=94eb2c074c0e54d1ab054413f631 archived-at: Tue, 20 Dec 2016 09:50:21 -0000 --94eb2c074c0e54d1ab054413f631 Content-Type: text/plain; charset=UTF-8 Hi all, I have a streaming job that I want at some point to duplicate stream, so that one copy of it ends up in a sink, and another one goes down the processing pipeline. This way I want to implement something similar to "tee" Linux utility. I tried a naive approach that looked like this: val streamToMirror = env.addSource(mySource). streamToMirror.addSink(someSink) // (1) tee streamToMirror..addSink(anotherSink) // (2) process further But the above results in the stream being split between the mySink (1) and further processors (2) in a seemingly nondeterministic way. I could write through Kafka as shown in this pseudocode: val headStream = env.addSource(mySource). headStream.addSink(KafkaSink("myTopic")) val tailStream = env.addSource(KafkaSource("myTopic")). But this would incur additional latency + deserialization overhead that I would like to avoid. Is there any way to implement stream teeing as described? Thanks, Yury --94eb2c074c0e54d1ab054413f631 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

I have a streaming job that I w= ant at some point to duplicate stream, so that one copy of it ends up in a = sink, and another one goes down the processing pipeline. This way I want to= implement something similar to "tee" Linux utility.
I tried a naive approach that looked like this:

=
val streamToMirror =3D env.a= ddSource(mySource).<some operators here>
streamToMirror.addSink(someSink) // (1) tee=C2=A0=
streamToMirror.<mo= re operators here>.addSink(anotherSink) // (2) process further

But the above results in the stream being split betw= een the mySink (1) and further processors (2) in a seemingly nondeterminist= ic way.

I could write through Kafka as shown in th= is pseudocode:

val headStream =3D env.addSource(mySource).<some operators here>
headStream.addSink(KafkaS= ink("myTopic"))
val tailStream =3D env.addSource(KafkaSource("myTopic")).<= more operators here>

But this would incu= r additional latency + deserialization overhead that I would like to avoid.=

Is there any way to implement stream teeing as de= scribed?

Thanks,
Yury
--94eb2c074c0e54d1ab054413f631--