Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4AC5C200BA3 for ; Thu, 20 Oct 2016 22:58:42 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4970F160AE0; Thu, 20 Oct 2016 20:58:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8DA1F160ACC for ; Thu, 20 Oct 2016 22:58:41 +0200 (CEST) Received: (qmail 78869 invoked by uid 500); 20 Oct 2016 20:58:40 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 78860 invoked by uid 99); 20 Oct 2016 20:58:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Oct 2016 20:58:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 401641A00C9 for ; Thu, 20 Oct 2016 20:58:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id SDLUE0NoS0hj for ; Thu, 20 Oct 2016 20:58:39 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id F27C85F576 for ; Thu, 20 Oct 2016 20:58:38 +0000 (UTC) Received: (qmail 51708 invoked by uid 99); 20 Oct 2016 18:11:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Oct 2016 18:11:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B31262C2A67 for ; Thu, 20 Oct 2016 18:11:58 +0000 (UTC) Date: Thu, 20 Oct 2016 18:11:58 +0000 (UTC) From: "Daniel Halperin (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (BEAM-217) BoundedSource.splitAtFraction should be splitAfterFraction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 20 Oct 2016 20:58:42 -0000 [ https://issues.apache.org/jira/browse/BEAM-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Halperin updated BEAM-217: --------------------------------- Labels: backward-incompatible (was: ) > BoundedSource.splitAtFraction should be splitAfterFraction > ---------------------------------------------------------- > > Key: BEAM-217 > URL: https://issues.apache.org/jira/browse/BEAM-217 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Eugene Kirpichov > Priority: Minor > Labels: backward-incompatible > > Dynamic work rebalancing works by 1) determining how long the bundle should take in order to not be a straggler - the "deadline", 2) predicting where the bundle will be (position or fraction) by that deadline, and 3) requesting an atomic split (splitAtFraction). > Currently all BoundedSource's and (in Dataflow runner) NativeReaderIterator's refuse splits if they have already consumed the requested split position. > Splitting a task [A, C) at position B generates [A, B) and [B, C), so if we predict that by deadline the task will have last consumed position X, we should split not "at" X, but "after" X (i.e. at next(X)) - i.e. into [A, X] (because X is already consumed) and (X, C) equivalently [A, next(X)) and [next(X), C). > One way to fit this into the current BoundedSource API is to rename splitAtFraction to splitAfterFraction and adjust the documentation. Documentation of getFractionConsumed also needs to be clarified to emphasize that it should return what fraction of all positions in the source have already been consumed, including the position of the last consumed record. For example, for an index-range task with range [0, 5), after it has read the first record at position 0, it has consumed 20%, rather than 0% (and of course not 40% even if an internal "next index" variable is now 1 - this mistake is especially easy to make in a file-based source if you base the calculations on the file's offset *after* consuming the record - the correct way is to calculate based on offsets of beginning of records). -- This message was sent by Atlassian JIRA (v6.3.4#6332)