Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9984418CA5 for ; Mon, 1 Feb 2016 12:05:06 +0000 (UTC) Received: (qmail 20035 invoked by uid 500); 1 Feb 2016 12:05:00 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 19961 invoked by uid 500); 1 Feb 2016 12:05:00 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 19947 invoked by uid 99); 1 Feb 2016 12:05:00 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2016 12:05:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C7BF91A01BC for ; Mon, 1 Feb 2016 12:04:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.447 X-Spam-Level: *** X-Spam-Status: No, score=3.447 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.554, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id E2VPBcDHEjgt for ; Mon, 1 Feb 2016 12:04:50 +0000 (UTC) Received: from mail.tu-berlin.de (mail.tu-berlin.de [130.149.7.33]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 6C3C120D7C for ; Mon, 1 Feb 2016 12:04:50 +0000 (UTC) X-tubIT-Incoming-IP: 130.149.6.150 Received: from ex-mbx06.tubit.win.tu-berlin.de ([130.149.6.150] helo=exchange.tu-berlin.de) by mail.tu-berlin.de (exim-4.76/mailfrontend-5) with esmtp for id 1aQDDr-00032A-8W; Mon, 01 Feb 2016 13:04:49 +0100 Received: from [10.245.10.39] (176.2.129.10) by EX-MBX06.tubit.win.tu-berlin.de (130.149.6.150) with Microsoft SMTP Server (TLS) id 15.0.1156.6; Mon, 1 Feb 2016 13:04:34 +0100 User-Agent: K-9 Mail for Android In-Reply-To: References: <56AF37CD.6060204@mailbox.tu-berlin.de> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----LW5M0LIHKOB59OUYYLPXU6GG3PLNQJ" Content-Transfer-Encoding: 8bit Subject: Re: DataExchangeMode.BATCH in iterations From: Fridtjof Sander Date: Mon, 1 Feb 2016 13:04:35 +0100 To: Message-ID: X-ClientProxiedBy: EX-CAS02.tubit.win.tu-berlin.de (130.149.6.142) To EX-MBX06.tubit.win.tu-berlin.de (130.149.6.150) X-PMWin-Version: 4.0.1, Antivirus-Engine: 3.63.0, Antivirus-Data: 5.23 X-PureMessage: [Scanned] ------LW5M0LIHKOB59OUYYLPXU6GG3PLNQJ Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="UTF-8" Hi Fabian, thanks for your explanation! Yeah, I figured that if an easy fix exists, you would have done that yourself. This is more for me to understand the conceptual problem. But back to the pipeline-requirement: Doesn't zipWithIndex violate that too, then? It's also a mapPartitions, collect + broadcast, plus another mapPartitions. This should roughly be the same procedure as building a histogram and propagate partition boundaries, right?. Not much going on there with pipelining. However, I hadn't problems with zipWithIndex inside iterations. Also, is there a difference between the "materialization" you mentioned and the execution of a datasink operator? Again, if all that is written somewhere, just throw me the link, I don't want to waste your time. Best Fridtjof Am 1. Februar 2016 12:21:21 MEZ, schrieb Fabian Hueske : >Hi Fridtjof, > >the range partitioner works by building a histogram for the >partitioning >key. This requires a pass over the whole intermediate data set which >means >it needs to be materialized and cannot be processed in a pipelined >fashion. >However, pipelined data exchange strategies are a requirement for the >data >flows which are executed for iteration bodies. > >This is nothing that can be easily fixed at the moment. Touching this >part >of the runtime code would have major implications. >I afraid, but I believe we have to accept this restriction. > >Best, Fabian > > >2016-02-01 11:47 GMT+01:00 Fridtjof Sander >: > >> Dear Flink-Devs, >> >> I recently ran into a problem where range-partitioning within >iterations >> would be useful. >> >> In the PR for range-partitioning it is said, this doesn't work >because of >> some batched data-exchange mode. >> https://github.com/apache/flink/pull/1255 >> >> I would like to understand the issue with that, but could not find >> articles/blog posts/etc to read about that. >> >> Do you have some pointers for me? Code will also work if the concept >gets >> clear from it. >> >> Thanks for your time! >> >> Best, Fridtjof >> ------LW5M0LIHKOB59OUYYLPXU6GG3PLNQJ--