Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01AE8175A4 for ; Wed, 18 Mar 2015 12:20:02 +0000 (UTC) Received: (qmail 1673 invoked by uid 500); 18 Mar 2015 12:20:01 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 1629 invoked by uid 500); 18 Mar 2015 12:20:01 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 1620 invoked by uid 99); 18 Mar 2015 12:20:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 12:20:01 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 18 Mar 2015 12:19:40 +0000 Received: (qmail 263 invoked by uid 99); 18 Mar 2015 12:19:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 12:19:38 +0000 Date: Wed, 18 Mar 2015 12:19:38 +0000 (UTC) From: "Anis Nasir (JIRA)" To: issues@flink.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (FLINK-1725) New Partitioner for better load balancing for skewed data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Anis Nasir created FLINK-1725: --------------------------------- Summary: New Partitioner for better load balancing for skewed data Key: FLINK-1725 URL: https://issues.apache.org/jira/browse/FLINK-1725 Project: Flink Issue Type: Improvement Components: New Components Affects Versions: 0.8.1 Reporter: Anis Nasir Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on key distribution of the stream for skewed skewede data. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. Partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. For all these reasons, we believe it will be a nice addition to the standard Partitioners available in Flink. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1]. https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2]. https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)