flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4329) Fix Streaming File Source Timestamps/Watermarks Handling
Date Wed, 28 Sep 2016 13:27:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529592#comment-15529592

ASF GitHub Bot commented on FLINK-4329:

Github user kl0u commented on the issue:

    Hi @StephanEwen , thanks for the review!
    The watermarks/timestamps are now generated by the Reader, and not the operator that creates
the splits. The same holds for the LongMax watermark, which is created at the close() of the
    As for tests, it is the testFileReadingOperatorWithIngestionTime() in the ContinuousFileMonitoringTest
which checks if the last Watermark is the LongMax.
    The original problem was that there were no timestamps assigned to the elements for Ingestion
time and watermarks were emitted (I think it was a Process_once case).

> Fix Streaming File Source Timestamps/Watermarks Handling
> --------------------------------------------------------
>                 Key: FLINK-4329
>                 URL: https://issues.apache.org/jira/browse/FLINK-4329
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>    Affects Versions: 1.1.0
>            Reporter: Aljoscha Krettek
>            Assignee: Kostas Kloudas
>             Fix For: 1.2.0, 1.1.3
> The {{ContinuousFileReaderOperator}} does not correctly deal with watermarks, i.e. they
are just passed through. This means that when the {{ContinuousFileMonitoringFunction}} closes
and emits a {{Long.MAX_VALUE}} that watermark can "overtake" the records that are to be emitted
in the {{ContinuousFileReaderOperator}}. Together with the new "allowed lateness" setting
in window operator this can lead to elements being dropped as late.
> Also, {{ContinuousFileReaderOperator}} does not correctly assign ingestion timestamps
since it is not technically a source but looks like one to the user.

This message was sent by Atlassian JIRA

View raw message