Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 129C6200D5B for ; Wed, 29 Nov 2017 02:13:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 11482160C16; Wed, 29 Nov 2017 01:13:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5D656160BE7 for ; Wed, 29 Nov 2017 02:13:03 +0100 (CET) Received: (qmail 21576 invoked by uid 500); 29 Nov 2017 01:13:02 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 21567 invoked by uid 99); 29 Nov 2017 01:13:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Nov 2017 01:13:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D5EDC1A0996 for ; Wed, 29 Nov 2017 01:13:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id pY4F4DBrIkgj for ; Wed, 29 Nov 2017 01:13:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BA4565F177 for ; Wed, 29 Nov 2017 01:13:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5C5C1E019B for ; Wed, 29 Nov 2017 01:13:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 16E87241BE for ; Wed, 29 Nov 2017 01:13:00 +0000 (UTC) Date: Wed, 29 Nov 2017 01:13:00 +0000 (UTC) From: "Eugene Kirpichov (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-3030) watchForNewFiles() can emit a file multiple times if it's growing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 29 Nov 2017 01:13:04 -0000 [ https://issues.apache.org/jira/browse/BEAM-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269863#comment-16269863 ] Eugene Kirpichov commented on BEAM-3030: ---------------------------------------- Fix in https://github.com/apache/beam/pull/4190 > watchForNewFiles() can emit a file multiple times if it's growing > ----------------------------------------------------------------- > > Key: BEAM-3030 > URL: https://issues.apache.org/jira/browse/BEAM-3030 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Eugene Kirpichov > Fix For: 2.3.0 > > > TextIO and AvroIO watchForNewFiles(), as well as FileIO.match().continuously(), use Watch transform under the hood, and watch the set of Metadata matching a filepattern. > Two Metadata's with the same filename but different size are not considered equal, so if these transforms observe the same file multiple times with different sizes, they'll read the file multiple times. > This is likely not yet a problem for production users, because these features require SDF, it's supported only in Dataflow runner, and users of the Dataflow runner are likely to use only files on GCS which doesn't support appends. However, this needs to be fixed still. -- This message was sent by Atlassian JIRA (v6.4.14#64029)