Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 49213200C31 for ; Wed, 8 Mar 2017 15:14:43 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 47C42160B86; Wed, 8 Mar 2017 14:14:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9208B160B75 for ; Wed, 8 Mar 2017 15:14:42 +0100 (CET) Received: (qmail 31877 invoked by uid 500); 8 Mar 2017 14:14:41 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 31865 invoked by uid 99); 8 Mar 2017 14:14:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Mar 2017 14:14:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 25B5BC0BF8 for ; Wed, 8 Mar 2017 14:14:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id WBUnZjsmYmPp for ; Wed, 8 Mar 2017 14:14:40 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D5D9660DFC for ; Wed, 8 Mar 2017 14:14:39 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C60A5E0A4B for ; Wed, 8 Mar 2017 14:14:38 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 197D724365 for ; Wed, 8 Mar 2017 14:14:38 +0000 (UTC) Date: Wed, 8 Mar 2017 14:14:38 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-1725) New Partitioner for better load balancing for skewed data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 08 Mar 2017 14:14:43 -0000 [ https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901312#comment-15901312 ] ASF GitHub Bot commented on FLINK-1725: --------------------------------------- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/1069 Can we close this pull request and revisit the feature later? The partial grouping does currently not work for windows, rescaling, etc, and it would be quite involved to add this. > New Partitioner for better load balancing for skewed data > --------------------------------------------------------- > > Key: FLINK-1725 > URL: https://issues.apache.org/jira/browse/FLINK-1725 > Project: Flink > Issue Type: Improvement > Components: DataStream API > Affects Versions: 0.8.1 > Reporter: Anis Nasir > Assignee: Anis Nasir > Labels: LoadBalancing, Partitioner > Original Estimate: 336h > Remaining Estimate: 336h > > Hi, > We have recently studied the problem of load balancing in Storm [1]. > In particular, we focused on key distribution of the stream for skewed data. > We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. > In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. > Partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. > For all these reasons, we believe it will be a nice addition to the standard Partitioners available in Flink. If the community thinks it's a good idea, we will be happy to offer support in the porting. > References: > [1]. https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf > [2]. https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.15#6346)