flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenlong Lyu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-5284) Make output of bucketing sink compatible with other processing framework like mapreduce
Date Thu, 08 Dec 2016 10:01:09 GMT
Wenlong Lyu created FLINK-5284:
----------------------------------

             Summary: Make output of bucketing sink compatible with other processing framework
like mapreduce
                 Key: FLINK-5284
                 URL: https://issues.apache.org/jira/browse/FLINK-5284
             Project: Flink
          Issue Type: Improvement
          Components: filesystem-connector
            Reporter: Wenlong Lyu
            Assignee: Wenlong Lyu


Currently bucketing sink cannot move the in-progress and pending files to final output when
the stream finished, and when recovering, the current output file will contain some invalid
content, which can only be identified by the file-length meta file. These make the final output
of the job incompatible to other processing framework like mapreduce. There are two things
to do to solve the problem:
1. add direct output option to bucketing sink, which writes output to the final file, and
delete/truncate the some file when fail over. direct output will be quite useful specially
for finite stream job, which can enable user to migrate there batch job to streaming, taking
advantage of features such as checkpointing.
2. add truncate by copy option to enable bucketing sink to resize output file by copying content
valid in current file instead of creating a length meta file. truncate by copy will make some
more extra IO operation, but can make the output more clean.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message