hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Thiltges (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-13294) Flushing writes to disk with libhdfs
Date Thu, 15 Mar 2018 15:09:00 GMT
John Thiltges created HDFS-13294:

             Summary: Flushing writes to disk with libhdfs
                 Key: HDFS-13294
                 URL: https://issues.apache.org/jira/browse/HDFS-13294
             Project: Hadoop HDFS
          Issue Type: Wish
          Components: libhdfs
            Reporter: John Thiltges

I'm working with an FTP server that writes into HDFS using libhdfs. I'd like to ensure that
incoming files are persisted on datanode disks before returning success to clients. At present,
power failures often mean lost blocks for recent uploads.

The hsync() call and CreateFlag.SYNC_BLOCK open flags seem like the right direction, but there
doesn't appear to be a way to set SYNC_BLOCK with the libhdfs interface. I believe hsync()
only applies to the current block for a filehandle.

Thoughts on implementing it:
 # Use an existing 'close enough' fcntl flag to set SYNC_BLOCK?
    Maybe O_DIRECT? Or O_SYNC or O_DSYNC
    This would probably be the best, as it would keep the libhdfs interface the same, and
older versions would ignore the flags.
 # Make hdfsOpenFile2 and have it accept HDFS flags (instead of fcntl flags)?
 # Provide a method in DFSOutputStream to set shouldSyncBlock on an existing stream, and a
function in libhdfs to enable it?

For flushing writes with libhdfs right now (using CDH5), I'm guessing my only option is to
call hsync() after every 'block size' of writes, exactly on the boundary.

Best regards,

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message