flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From StephanEwen <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-2583] Add Stream Sink For Rolling HDFS ...
Date Thu, 03 Sep 2015 09:44:24 GMT
Github user StephanEwen commented on the pull request:

    I think using truncate for exactly once is the way to go. To support users with older
HDFS versions, how about this:
    1. We consider only valid what was written successfully at a checkpoint (hflush/hsync).
When we roll over to a new file on restart, we write a `.length` file for that other file
that indicates how many bytes are valid in that file. Basically simulating truncate by adding
a metadata file.
    2. Optionally, the user can activate a merge-on roll-over, that takes all the files from
the attempts and all the metadata files, and merges them into one file. This rollover can
be written such that it works incrementally and re-tries on failures, etc...

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message