cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhu Han (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3248) CommitLog writer should call fdatasync instead of fsync
Date Fri, 23 Sep 2011 14:20:26 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113466#comment-13113466
] 

Zhu Han commented on CASSANDRA-3248:
------------------------------------

It is very strange that there is no different between fdatasync and fsync when overwrite an
preallocated file. It should be highly correlated with the underlying file system implementation.


IMHO, we should not fix this issue unless we have a thorough test on different file systems.

{quote}
$ sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=512M --file-fsync-all=on
--file-fsync-mode=fsync --file-test-mode=seqrewr --max-time=600 --file-block-size=2K  --max-requests=0
prepare

$ sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=512M --file-fsync-all=on
--file-fsync-mode={color:red}fsync{color} --file-test-mode=seqrewr --max-time=600 --file-block-size=2K
 --max-requests=0 run

Operations performed:  0 Read, 29384 Write, 29384 Other = 58768 Total
Read 0b  Written 57.391Mb  Total transferred 57.391Mb  (97.943Kb/sec)
   48.97 Requests/sec executed

    per-request statistics:
         min:                                 12.94ms
         avg:                                 20.42ms
         max:                                125.02ms
         approx.  95 percentile:              25.02ms

$ sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=512M --file-fsync-all=on
--file-fsync-mode={color:red}fdatasync{color} --file-test-mode=seqrewr --max-time=600 --file-block-size=2K
 --max-requests=0 run


Operations performed:  0 Read, 29307 Write, 29307 Other = 58614 Total
Read 0b  Written 57.24Mb  Total transferred 57.24Mb  (97.688Kb/sec)
   48.84 Requests/sec executed

    per-request statistics:
         min:                                 16.21ms
         avg:                                 20.47ms
         max:                                116.69ms
         approx.  95 percentile:              25.02ms


{quote}

> CommitLog writer should call fdatasync instead of fsync
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3248
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3248
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.6.13, 0.7.9, 0.8.6, 1.0.0, 1.1
>         Environment: Linux
>            Reporter: Zhu Han
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> CommitLogSegment uses SequentialWriter to flush the buffered data to log device. It depends
on FileDescriptor#sync() which invokes fsync() as it force the file attributes to disk.
> However, at least on Linux, fdatasync() is good enough for commit log flush:
> bq. fdatasync() is similar to fsync(), but does not flush modified metadata unless that
metadata is needed in order to allow a subsequent data retrieval to be  correctly handled.
 For example, changes to st_atime or st_mtime (respectively, time of last access and time
of last modification; see stat(2)) do not require flushing because they are not necessary
for a subsequent data read to be handled correctly.  On the other hand, a change to the file
size (st_size,  as  made  by  say  ftruncate(2)),  would require a metadata flush.
> File size is synced to disk by fdatasync() either. Although the commit log recovery logic
sorts the commit log segements on their modify timestamp, it can be removed safely, IMHO.
> I checked the native code of JRE 6. On Linux and Solaris, FileChannel#force(false) invokes
fdatasync(). On windows, the false flag does not have any impact.
> On my log device (commodity SATA HDD, write cache disabled), there is large performance
gap between fsync() and fdatasync():
> {quote}
> $sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G --file-fsync-all=on
--file-fsync-mode={color:red}fdatasync{color} --file-test-mode=seqwr --max-time=600 --file-block-size=2K
 --max-requests=0 run
> {color:blue}54.90{color} Requests/sec executed
>    per-request statistics:
>          min:                                  8.29ms
>          avg:                                 18.18ms
>          max:                                108.36ms
>          approx.  95 percentile:              25.02ms
> $ sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G --file-fsync-all=on
--file-fsync-mode={color:red}fsync{color} --file-test-mode=seqwr --max-time=600 --file-block-size=2K
 --max-requests=0 run
> {color:blue}28.08{color} Requests/sec executed
>     per-request statistics:
>          min:                                 33.28ms
>          avg:                                 35.61ms
>          max:                                911.87ms
>          approx.  95 percentile:              41.69ms
> {quote}
> I do think this is a very critical performance improvement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message