flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-9138) Enhance BucketingSink to also flush data by time interval
Date Tue, 24 Apr 2018 14:50:01 GMT

    [ https://issues.apache.org/jira/browse/FLINK-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449984#comment-16449984
] 

ASF GitHub Bot commented on FLINK-9138:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5860#discussion_r183753866
  
    --- Diff: flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java
---
    @@ -87,9 +87,11 @@
      * and a rolling counter. For example the file {@code "part-1-17"} contains the data
from
      * {@code subtask 1} of the sink and is the {@code 17th} bucket created by that subtask.
Per default
      * the part prefix is {@code "part"} but this can be configured using {@link #setPartPrefix(String)}.
    - * When a part file becomes bigger than the user-specified batch size the current part
file is closed,
    - * the part counter is increased and a new part file is created. The batch size defaults
to {@code 384MB},
    - * this can be configured using {@link #setBatchSize(long)}.
    + * When a part file becomes bigger than the user-specified batch size or when the part
file becomes older
    + * than the user-specified roll over interval the current part file is closed,the part
counter is increased
    --- End diff --
    
    Add space `closed,the` -> `closed, the`


> Enhance BucketingSink to also flush data by time interval
> ---------------------------------------------------------
>
>                 Key: FLINK-9138
>                 URL: https://issues.apache.org/jira/browse/FLINK-9138
>             Project: Flink
>          Issue Type: Improvement
>          Components: filesystem-connector
>    Affects Versions: 1.4.2
>            Reporter: Narayanan Arunachalam
>            Priority: Major
>
> BucketingSink now supports flushing data to the file system by size limit and by period
of inactivity. It will be useful to also flush data by a specified time period. This way,
the data will be written out when write throughput is low but there is no significant time
period gaps between the writes. This reduces ETA for the data in the file system and should
help move the checkpoints faster as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message