Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 858C0200C62 for ; Wed, 26 Apr 2017 23:38:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 82CD0160BB4; Wed, 26 Apr 2017 21:38:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CED4A160BA8 for ; Wed, 26 Apr 2017 23:38:07 +0200 (CEST) Received: (qmail 71302 invoked by uid 500); 26 Apr 2017 21:38:07 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 71292 invoked by uid 99); 26 Apr 2017 21:38:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Apr 2017 21:38:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 8C2AA1B05C2 for ; Wed, 26 Apr 2017 21:38:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id yuklIvjBRrpP for ; Wed, 26 Apr 2017 21:38:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id C048E5F177 for ; Wed, 26 Apr 2017 21:38:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 52C7DE0BDF for ; Wed, 26 Apr 2017 21:38:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id BEB6621DE1 for ; Wed, 26 Apr 2017 21:38:04 +0000 (UTC) Date: Wed, 26 Apr 2017 21:38:04 +0000 (UTC) From: "Eugene Kirpichov (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (BEAM-217) BoundedSource.splitAtFraction should be splitAfterFraction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 26 Apr 2017 21:38:08 -0000 [ https://issues.apache.org/jira/browse/BEAM-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Kirpichov updated BEAM-217: ---------------------------------- Fix Version/s: (was: First stable release) Not applicable > BoundedSource.splitAtFraction should be splitAfterFraction > ---------------------------------------------------------- > > Key: BEAM-217 > URL: https://issues.apache.org/jira/browse/BEAM-217 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Eugene Kirpichov > Priority: Minor > Labels: backward-incompatible > Fix For: Not applicable > > > Dynamic work rebalancing works by 1) determining how long the bundle should take in order to not be a straggler - the "deadline", 2) predicting where the bundle will be (position or fraction) by that deadline, and 3) requesting an atomic split (splitAtFraction). > Currently all BoundedSource's and (in Dataflow runner) NativeReaderIterator's refuse splits if they have already consumed the requested split position. > Splitting a task [A, C) at position B generates [A, B) and [B, C), so if we predict that by deadline the task will have last consumed position X, we should split not "at" X, but "after" X (i.e. at next(X)) - i.e. into [A, X] (because X is already consumed) and (X, C) equivalently [A, next(X)) and [next(X), C). > One way to fit this into the current BoundedSource API is to rename splitAtFraction to splitAfterFraction and adjust the documentation. Documentation of getFractionConsumed also needs to be clarified to emphasize that it should return what fraction of all positions in the source have already been consumed, including the position of the last consumed record. For example, for an index-range task with range [0, 5), after it has read the first record at position 0, it has consumed 20%, rather than 0% (and of course not 40% even if an internal "next index" variable is now 1 - this mistake is especially easy to make in a file-based source if you base the calculations on the file's offset *after* consuming the record - the correct way is to calculate based on offsets of beginning of records). -- This message was sent by Atlassian JIRA (v6.3.15#6346)