Return-Path: X-Original-To: apmail-beam-commits-archive@minotaur.apache.org Delivered-To: apmail-beam-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 74AD1196E0 for ; Thu, 31 Mar 2016 05:06:27 +0000 (UTC) Received: (qmail 6754 invoked by uid 500); 31 Mar 2016 05:06:27 -0000 Delivered-To: apmail-beam-commits-archive@beam.apache.org Received: (qmail 6707 invoked by uid 500); 31 Mar 2016 05:06:27 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 6698 invoked by uid 99); 31 Mar 2016 05:06:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2016 05:06:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 01D59C6897 for ; Thu, 31 Mar 2016 05:06:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -5.016 X-Spam-Level: X-Spam-Status: No, score=-5.016 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.996] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id H36Nns5LVO9B for ; Thu, 31 Mar 2016 05:06:26 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id 213F15F2EE for ; Thu, 31 Mar 2016 05:06:26 +0000 (UTC) Received: (qmail 6676 invoked by uid 99); 31 Mar 2016 05:06:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2016 05:06:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 78BD92C033A for ; Thu, 31 Mar 2016 05:06:25 +0000 (UTC) Date: Thu, 31 Mar 2016 05:06:25 +0000 (UTC) From: "Daniel Halperin (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-68) Support for limiting parallelism of a step MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/BEAM-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219353#comment-15219353 ] Daniel Halperin commented on BEAM-68: ------------------------------------- Eugene: I disagree with the premise that even the group by key trick works without runner support. Fusion breaks and dynamic work rebalancing violate all such assumptions. Model changes are all that will guarantee anything here. > Support for limiting parallelism of a step > ------------------------------------------ > > Key: BEAM-68 > URL: https://issues.apache.org/jira/browse/BEAM-68 > Project: Beam > Issue Type: New Feature > Components: beam-model > Reporter: Daniel Halperin > > Users may want to limit the parallelism of a step. Two classic uses cases are: > - User wants to produce at most k files, so sets TextIO.Write.withNumShards(k). > - External API only supports k QPS, so user sets a limit of k/(expected QPS/step) on the ParDo that makes the API call. > Unfortunately, there is no way to do this effectively within the Beam model. A GroupByKey with exactly k keys will guarantee that only k elements are produced, but runners are free to break fusion in ways that each element may be processed in parallel later. > To implement this functionaltiy, I believe we need to add this support to the Beam Model. -- This message was sent by Atlassian JIRA (v6.3.4#6332)