Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 50EE4185DD for ; Wed, 15 Jul 2015 13:35:27 +0000 (UTC) Received: (qmail 56960 invoked by uid 500); 15 Jul 2015 13:35:27 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 56888 invoked by uid 500); 15 Jul 2015 13:35:27 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 56878 invoked by uid 99); 15 Jul 2015 13:35:27 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jul 2015 13:35:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B453118019F for ; Wed, 15 Jul 2015 13:35:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.01 X-Spam-Level: X-Spam-Status: No, score=-0.01 tagged_above=-999 required=6.31 tests=[SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JH3jmSS1SsWJ for ; Wed, 15 Jul 2015 13:35:17 +0000 (UTC) Received: from mailout1.informatik.hu-berlin.de (mailout1.informatik.hu-berlin.de [141.20.20.101]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 8DB8622F0B for ; Wed, 15 Jul 2015 13:35:16 +0000 (UTC) Received: from mailbox.informatik.hu-berlin.de (mailbox [141.20.20.63]) by mail.informatik.hu-berlin.de (8.14.7/8.14.7/INF-2.0-MA-SOLARIS-2.10-25) with ESMTP id t6FDW2N9027110 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 15 Jul 2015 15:32:03 +0200 (MEST) Received: from [141.20.27.42] ([141.20.27.42]) (authenticated bits=0) by mailbox.informatik.hu-berlin.de (8.14.7/8.14.7/INF-2.0-MA-SOLARIS-2.10-AUTH-26-465-587) with ESMTP id t6FDW1mE027106 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Wed, 15 Jul 2015 15:32:02 +0200 (MEST) Message-ID: <55A660EB.9060101@informatik.hu-berlin.de> Date: Wed, 15 Jul 2015 15:32:27 +0200 From: "Matthias J. Sax" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.7.0 MIME-Version: 1.0 To: user@flink.apache.org Subject: Re: Order groups by their keys References: In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="fkHpkUsfqPWtLMK2IfeOExV1W1M5xm65e" X-Virus-Scanned: clamav-milter 0.98.4 at mailbox X-Virus-Status: Clean X-Greylist: Sender succeeded STARTTLS authentication, not delayed by milter-greylist-4.5.1 (mail.informatik.hu-berlin.de [141.20.20.50]); Wed, 15 Jul 2015 15:32:03 +0200 (MEST) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --fkHpkUsfqPWtLMK2IfeOExV1W1M5xm65e Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Robert, global sorting of the final output is currently no supported by Flink out-of-the-box. The reason is, that a global sort requires all data to be processed by a single node (what contradicts data parallelism). For small output, you could use a final "reduce" with no key (ie, all data go to a single group) and dop=3D1 and do the sorting in-memory in an= own UDF. Hope this helps. -Matthias On 07/15/2015 02:56 PM, Robert Schmidtke wrote: > Hey everyone, >=20 > I'm currently trying to implement TPC-H Q1 and that involves ordering o= f > results. Now I'm not too familiar with the transformations yet, however= > for the life of me I cannot figure out how to get it to work. Consider > the following toy example: >=20 > final ExecutionEnvironment env =3D ExecutionEnvironment > .getExecutionEnvironment(); > DataSet> elems =3D env.fromElements( > new Tuple3("a", 2, 1), > new Tuple3("b", 1, 2), > new Tuple3("a", 1, 3), > new Tuple3("b", 1, 4), > new Tuple3("a", 1, 5), > new Tuple3("b", 2, 6), > new Tuple3("a", 2, 7), > new Tuple3("b", 2, 8)); > elems.groupBy(0, 1).sum(2).print(); >=20 > I want the output to be: > (a,1,8) > (a,2,8) > (b,1,6) > (b,2,14) >=20 > However the output is: > (a,2,8) > (b,1,6) > (b,2,14) > (a,1,8) >=20 > No matter where I place sorting of partitions or groups transformations= > (strange enough I just realized that when I don't add any ordering, the= > output is as expected; however this is just the case in the toy example= > and not in my TPC-H Q1). Is it currently not possible to achieve an > ordered output in this case? Please bear with me if I overlooked the > obvious, but I could not get a clear picture from the documentation. >=20 > Btw. the code is right > here: https://github.com/robert-schmidtke/flink-benchmarks/blob/master/= xtreemfs-flink-benchmark/src/main/java/org/xtreemfs/flink/benchmark/TPCH1= Benchmark.java#L137 > I verified the results with the provided data from TPC-H, apart from th= e > sorting everything is fine. >=20 > Thanks a bunch in advance, >=20 > Cheers > Robert >=20 > --=20 > My GPG Key ID: 336E2680 --fkHpkUsfqPWtLMK2IfeOExV1W1M5xm65e Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJVpmDsAAoJEBXkotPFErDWR1MP/juZbloS7epANj7AAP6TYig7 LzbJZPGAXBxqsUryxmaiAM6X8GhYQ3TkaHGvxfb1GBUSwiR5IFOFTB3ULSaOvlYb 5TuXKF+AOxaUMscumpR++jT8h5rNXxbNvT+NrhdqoGOYiYd8EKenAgShyXUhU/Xm y39FsnccK89rgoHkytm1Q01L5bI645bOvm8SY72cebqAScBKhEEeQbkUAejj4tlt YH9ZhXskNqJ55uOa4IHhf0qtm5BMxO6sD9rAjqt3COaTqQx5UOQmUAkSGuWfxX5T 8R6t4X2wey+rBu60dWFa6jDijrJP7IW5biwTre6xbxQ8r+xSQB+yJRT9xxGrmo6d wRoW+GalEJw8wgsBxzOanpsge3t2AFmrNJmIckWYKu5ANqnn/Fv6/++Tzt0wgt4u 07uAyWvYVk4WTst90ugK33TcD083ghdaHoYRbOVGmVuH7GmZ1HENsvXm4jhSwRLR 0vSv5w1u8bsFYVl/gX4rYJnaqPAdC0+v269hw/k2s2+sk1g7lUiP9dEO4jsp0+6O McjdNFMXwbzWT1hMnVx6hc5R0FN6wm3+JhE1GRpPx4GzUv/bG12oby39e63CIAIx uk1zbluC2dQvUp2byC8q4TsjOQzllHsUF2U3l4Apy3wZ+5zAz5s2j/TEGF7FdVeh aXl//FpUl0/n0cLQYqg2 =Pdip -----END PGP SIGNATURE----- --fkHpkUsfqPWtLMK2IfeOExV1W1M5xm65e--