flume-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Illes S (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLUME-2900) Allow triggering hsync for HDFS sink during write
Date Sun, 17 Apr 2016 18:43:25 GMT
Illes S created FLUME-2900:

             Summary: Allow triggering hsync for HDFS sink during write
                 Key: FLUME-2900
                 URL: https://issues.apache.org/jira/browse/FLUME-2900
             Project: Flume
          Issue Type: Wish
          Components: Sinks+Sources
            Reporter: Illes S
            Priority: Minor

HDFS sink calls {{hflush()}} (or {{sync()}}) on the {{FSDataOutputStream}} which will flush
client buffers, but will not update the output file size on the NameNode (see HDFS-5478) while
it is being written, only after it is closed.

It would be nice to allow users to trigger updating the file length (which also syncs file
data to disk, see HDFS-4213):

{{((HdfsDataOutputStream) fos).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));}}

This could be done via new {{hdfs.hsyncInterval}}, {{hdfs.hsyncSize}} and {{hdfs.hsyncCount}}
configuration options.

A workaround is to roll the output file more often, but that leads to many small files which
may be worse than putting extra load on the NameNode by calling {{hsync(...)}} multiple times
during write, right?

This message was sent by Atlassian JIRA

View raw message