Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5F82B200CD8 for ; Wed, 2 Aug 2017 22:29:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5BCEF16A43B; Wed, 2 Aug 2017 20:29:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A143E16A436 for ; Wed, 2 Aug 2017 22:29:05 +0200 (CEST) Received: (qmail 22357 invoked by uid 500); 2 Aug 2017 20:29:04 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 22346 invoked by uid 99); 2 Aug 2017 20:29:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Aug 2017 20:29:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6FA9B1A05CA for ; Wed, 2 Aug 2017 20:29:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id AYOnNcqlJ5C7 for ; Wed, 2 Aug 2017 20:29:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 947C15F56A for ; Wed, 2 Aug 2017 20:29:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C546CE012F for ; Wed, 2 Aug 2017 20:29:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3937C21ED9 for ; Wed, 2 Aug 2017 20:29:00 +0000 (UTC) Date: Wed, 2 Aug 2017 20:29:00 +0000 (UTC) From: "Eugene Kirpichov (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (BEAM-2716) AvroReader should refuse dynamic splits while in the last block MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 02 Aug 2017 20:29:06 -0000 Eugene Kirpichov created BEAM-2716: -------------------------------------- Summary: AvroReader should refuse dynamic splits while in the last block Key: BEAM-2716 URL: https://issues.apache.org/jira/browse/BEAM-2716 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Eugene Kirpichov Assignee: Eugene Kirpichov Priority: Minor AvroReader is able to detect when it's in the last block: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java#L728 It could also use this information to avoid wastefully producing dynamic splits starting in the range of the current block. One way to do this would be to have OffsetRangeTracker have a "claim range" operation: claim range of [a, b) is, in terms of correctness, equivalent to claiming "a" (it checks whether "a" is within the range), but sets the last claimed position to "b" rather than "a", thus protecting more positions from being split away. -- This message was sent by Atlassian JIRA (v6.4.14#64029)