cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhu Han (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3248) CommitLog writer should call fdatasync instead of fsync
Date Thu, 27 Oct 2011 13:52:32 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137148#comment-13137148
] 

Zhu Han commented on CASSANDRA-3248:
------------------------------------

Is the write cache on the SATA drives turned off? If no, durability of the file system is
not assured.

5K flush operations on SATA disk without NVRAM is amazing for me.
                
> CommitLog writer should call fdatasync instead of fsync
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3248
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3248
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.6.13, 0.7.9, 0.8.6, 1.0.0, 1.1
>         Environment: Linux
>            Reporter: Zhu Han
>            Assignee: Brandon Williams
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> CommitLogSegment uses SequentialWriter to flush the buffered data to log device. It depends
on FileDescriptor#sync() which invokes fsync() as it force the file attributes to disk.
> However, at least on Linux, fdatasync() is good enough for commit log flush:
> bq. fdatasync() is similar to fsync(), but does not flush modified metadata unless that
metadata is needed in order to allow a subsequent data retrieval to be  correctly handled.
 For example, changes to st_atime or st_mtime (respectively, time of last access and time
of last modification; see stat(2)) do not require flushing because they are not necessary
for a subsequent data read to be handled correctly.  On the other hand, a change to the file
size (st_size,  as  made  by  say  ftruncate(2)),  would require a metadata flush.
> File size is synced to disk by fdatasync() either. Although the commit log recovery logic
sorts the commit log segements on their modify timestamp, it can be removed safely, IMHO.
> I checked the native code of JRE 6. On Linux and Solaris, FileChannel#force(false) invokes
fdatasync(). On windows, the false flag does not have any impact.
> On my log device (commodity SATA HDD, write cache disabled), there is large performance
gap between fsync() and fdatasync():
> {quote}
> $sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G --file-fsync-all=on
--file-fsync-mode={color:red}fdatasync{color} --file-test-mode=seqwr --max-time=600 --file-block-size=2K
 --max-requests=0 run
> {color:blue}54.90{color} Requests/sec executed
>    per-request statistics:
>          min:                                  8.29ms
>          avg:                                 18.18ms
>          max:                                108.36ms
>          approx.  95 percentile:              25.02ms
> $ sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G --file-fsync-all=on
--file-fsync-mode={color:red}fsync{color} --file-test-mode=seqwr --max-time=600 --file-block-size=2K
 --max-requests=0 run
> {color:blue}28.08{color} Requests/sec executed
>     per-request statistics:
>          min:                                 33.28ms
>          avg:                                 35.61ms
>          max:                                911.87ms
>          approx.  95 percentile:              41.69ms
> {quote}
> I do think this is a very critical performance improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message