Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDC9E17582 for ; Mon, 15 Jun 2015 12:25:37 +0000 (UTC) Received: (qmail 98080 invoked by uid 500); 15 Jun 2015 12:25:37 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 98004 invoked by uid 500); 15 Jun 2015 12:25:37 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 97994 invoked by uid 99); 15 Jun 2015 12:25:37 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jun 2015 12:25:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 46E40CDECC for ; Mon, 15 Jun 2015 12:25:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.461 X-Spam-Level: X-Spam-Status: No, score=-0.461 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.46, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id XjjtxKmCY6gn for ; Mon, 15 Jun 2015 12:25:33 +0000 (UTC) Received: from mailout1.informatik.hu-berlin.de (mailout1.informatik.hu-berlin.de [141.20.20.101]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id AE7B020DC7 for ; Mon, 15 Jun 2015 12:25:32 +0000 (UTC) Received: from mailbox.informatik.hu-berlin.de (mailbox [141.20.20.63]) by mail.informatik.hu-berlin.de (8.14.7/8.14.7/INF-2.0-MA-SOLARIS-2.10-25) with ESMTP id t5FCMC2e009473 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 15 Jun 2015 14:22:13 +0200 (MEST) Received: from [141.20.27.42] (localhost [127.0.0.1]) (authenticated bits=0) by mailbox.informatik.hu-berlin.de (8.14.7/8.14.7/INF-2.0-MA-SOLARIS-2.10-AUTH-26-465-587) with ESMTP id t5FCMCR8009468 for ; Mon, 15 Jun 2015 14:22:12 +0200 (MEST) Message-ID: <557EC2C2.5050201@informatik.hu-berlin.de> Date: Mon, 15 Jun 2015 14:19:14 +0200 From: "Matthias J. Sax" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.7.0 MIME-Version: 1.0 To: user@flink.apache.org Subject: Re: Random Shuffling References: In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="6oXdIpVUGTIg96VpuCuMwOxVluBWR51ek" X-Virus-Scanned: clamav-milter 0.98.4 at mailbox X-Virus-Status: Clean X-Greylist: Sender succeeded STARTTLS authentication, not delayed by milter-greylist-4.5.1 (mail.informatik.hu-berlin.de [141.20.20.50]); Mon, 15 Jun 2015 14:22:13 +0200 (MEST) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --6oXdIpVUGTIg96VpuCuMwOxVluBWR51ek Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable I think, you need to implement an own Partitioner.java and hand it via DataSet.partitionCustom(partitioner, field) (Just specify any field you like; as you don't want to group by key, it doesn't matter.) When implementing the partitionier, you can ignore the key parameter and compute the output channel randomly. This is kind of a work-around, but it should work. -Matthias On 06/15/2015 01:49 PM, Maximilian Alber wrote: > Hi Flinksters, >=20 > I would like to shuffle my elements in the data set and then split it i= n > two according to some ratio. Each element in the data set has an unique= > id. Is there a nice way to do it with the flink api? > (It would be nice to have guaranteed random shuffling.) > Thanks! >=20 > Cheers, > Max --6oXdIpVUGTIg96VpuCuMwOxVluBWR51ek Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJVfsLEAAoJEBXkotPFErDWYRsP/jpghlQE7C9QzavetFzcDG4C bnry6vZRINfbhzttd73kSotOumhAByHSoD5lvGWw11NucN8ev8Drsq7bfBdOmhV9 N8I/4vFsU98jKBo+fTSJXKGGRSR2a9GGjANhyh3L/U07JCM52YHVwBsiH2e4Uh0b IHvmh+HDV4OT4ZtcS/lC3+7m+QtTnBlf2xGCBRA8oychtRpfOCIpqFJ5jK3iZD/l yS62nCcEIe2KIWR484axI+I5i9CwbrwiGTN6eBLjG0v6s9xqXf5kahZHAgT6aCVQ NDd4Vk28LdE7ahiZ0PXHBsAmnlvM+U49BYWdLJQdsYnvmvNDf0TeW09TzTVyAvIF HZDB8PztopN6BnLYW+aSFXWcW48xDlEtx49atH+nPssLR20yRE395pzzlldDxEEE g/kO6IvwmwfuEZ6wXYatTbD66GNE9PgQxInpYScGQ9ov7/NOkOoDatc1243XnuMf 35k09X934iuYuw2H1lW0ra8d7HwoEUudJeFVVe3d9MO7RPDf7bp2W2/8jFi4WDpE C3USUKt5CeRoduZvdThHS4UcDiSVzrmWOb8J5M5e5p2qDlP9hUI9cw5xR8BJ6lXy 20hwA53UT5gxdFpuqbGxGcABbkGSLSOdaCkde/EqLzeSAhulU0AuwERFybtEoyz8 NSQg8s6HuHaHO6YNerSI =NykA -----END PGP SIGNATURE----- --6oXdIpVUGTIg96VpuCuMwOxVluBWR51ek--