Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7AF7A200B92 for ; Wed, 28 Sep 2016 15:27:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 79D0F160AB4; Wed, 28 Sep 2016 13:27:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C4DA1160ADD for ; Wed, 28 Sep 2016 15:27:21 +0200 (CEST) Received: (qmail 38377 invoked by uid 500); 28 Sep 2016 13:27:20 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 38246 invoked by uid 99); 28 Sep 2016 13:27:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2016 13:27:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9B99D2C2A69 for ; Wed, 28 Sep 2016 13:27:20 +0000 (UTC) Date: Wed, 28 Sep 2016 13:27:20 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-4329) Fix Streaming File Source Timestamps/Watermarks Handling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 28 Sep 2016 13:27:22 -0000 [ https://issues.apache.org/jira/browse/FLINK-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529592#comment-15529592 ] ASF GitHub Bot commented on FLINK-4329: --------------------------------------- Github user kl0u commented on the issue: https://github.com/apache/flink/pull/2546 Hi @StephanEwen , thanks for the review! The watermarks/timestamps are now generated by the Reader, and not the operator that creates the splits. The same holds for the LongMax watermark, which is created at the close() of the ContinuousFileReaderOperator. As for tests, it is the testFileReadingOperatorWithIngestionTime() in the ContinuousFileMonitoringTest which checks if the last Watermark is the LongMax. The original problem was that there were no timestamps assigned to the elements for Ingestion time and watermarks were emitted (I think it was a Process_once case). > Fix Streaming File Source Timestamps/Watermarks Handling > -------------------------------------------------------- > > Key: FLINK-4329 > URL: https://issues.apache.org/jira/browse/FLINK-4329 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors > Affects Versions: 1.1.0 > Reporter: Aljoscha Krettek > Assignee: Kostas Kloudas > Fix For: 1.2.0, 1.1.3 > > > The {{ContinuousFileReaderOperator}} does not correctly deal with watermarks, i.e. they are just passed through. This means that when the {{ContinuousFileMonitoringFunction}} closes and emits a {{Long.MAX_VALUE}} that watermark can "overtake" the records that are to be emitted in the {{ContinuousFileReaderOperator}}. Together with the new "allowed lateness" setting in window operator this can lead to elements being dropped as late. > Also, {{ContinuousFileReaderOperator}} does not correctly assign ingestion timestamps since it is not technically a source but looks like one to the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)