flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timo Walther (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-9113) Data loss in BucketingSink when writing to local filesystem
Date Mon, 23 Apr 2018 14:49:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448259#comment-16448259

Timo Walther commented on FLINK-9113:

Fixed in 1.4.3: 896797be71e920110aa270402d031969126221bb

> Data loss in BucketingSink when writing to local filesystem
> -----------------------------------------------------------
>                 Key: FLINK-9113
>                 URL: https://issues.apache.org/jira/browse/FLINK-9113
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>            Reporter: Timo Walther
>            Assignee: Timo Walther
>            Priority: Blocker
>             Fix For: 1.5.0, 1.4.3
> For local filesystems, it is not guaranteed that the data is flushed to disk during checkpointing.
This leads to data loss in cases of TaskManager failures when writing to a local filesystem
{{org.apache.hadoop.fs.LocalFileSystem}}. The {{flush()}} method returns a written length
but the data is not written into the file (thus the valid length might be greater than the
actual file size). {{hsync}} and {{hflush}} have no effect either.
> It seems that this behavior won't be fixed in the near future: https://issues.apache.org/jira/browse/HADOOP-7844
> One solution would be to call {{close()}} on a checkpoint for local filesystems, even
though this would lead to performance decrease. If we don't fix this issue, we should at least
add proper documentation for it.

This message was sent by Atlassian JIRA

View raw message