Return-Path: X-Original-To: apmail-beam-commits-archive@minotaur.apache.org Delivered-To: apmail-beam-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 10C4A190CA for ; Thu, 31 Mar 2016 16:00:30 +0000 (UTC) Received: (qmail 141 invoked by uid 500); 31 Mar 2016 16:00:30 -0000 Delivered-To: apmail-beam-commits-archive@beam.apache.org Received: (qmail 99992 invoked by uid 500); 31 Mar 2016 16:00:30 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 99982 invoked by uid 99); 31 Mar 2016 16:00:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2016 16:00:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9240CC0217 for ; Thu, 31 Mar 2016 16:00:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.021 X-Spam-Level: X-Spam-Status: No, score=-4.021 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id LzVGnRa17MjQ for ; Thu, 31 Mar 2016 16:00:28 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id A2EAA5F47C for ; Thu, 31 Mar 2016 16:00:28 +0000 (UTC) Received: (qmail 99686 invoked by uid 99); 31 Mar 2016 16:00:28 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2016 16:00:28 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CB4742C14FB for ; Thu, 31 Mar 2016 16:00:27 +0000 (UTC) Date: Thu, 31 Mar 2016 16:00:27 +0000 (UTC) From: "Eugene Kirpichov (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-68) Support for limiting parallelism of a step MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/BEAM-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220089#comment-15220089 ] Eugene Kirpichov commented on BEAM-68: -------------------------------------- I re-duped BEAM-169 against BEAM-92. - I agree that it won't work with empty K, however that should be relatively unlikely if the user has enough data that sharding it makes sense, and if their hash function is good; and in some sharded sink scenarios it may be desirable to not write empty shards. - I'm not sure what you mean by "won't scale": individual shards have to be written sequentially, but they can be written in parallel with each other in this proposed implementation: dynamic rebalancing will separate the shard keys from each other. > Support for limiting parallelism of a step > ------------------------------------------ > > Key: BEAM-68 > URL: https://issues.apache.org/jira/browse/BEAM-68 > Project: Beam > Issue Type: New Feature > Components: beam-model > Reporter: Daniel Halperin > > Users may want to limit the parallelism of a step. Two classic uses cases are: > - User wants to produce at most k files, so sets TextIO.Write.withNumShards(k). > - External API only supports k QPS, so user sets a limit of k/(expected QPS/step) on the ParDo that makes the API call. > Unfortunately, there is no way to do this effectively within the Beam model. A GroupByKey with exactly k keys will guarantee that only k elements are produced, but runners are free to break fusion in ways that each element may be processed in parallel later. > To implement this functionaltiy, I believe we need to add this support to the Beam Model. -- This message was sent by Atlassian JIRA (v6.3.4#6332)