cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3248) CommitLog writer should call fdatasync instead of fsync
Date Thu, 27 Oct 2011 19:34:32 GMT


Peter Schuller commented on CASSANDRA-3248:

XFS should detect write barrier support and cause fsync() to actually penetrate the cache
(unless the SATA drive is lying about flushing it's cache). Interesting that you seemed to
be getting caching behavior still. LVM or anything in between that breaks write barriers?

(Not truly relevant to the test, but might be a relevant data point to see a case in practice
where write barriers aren't working when they are expected to.)

> CommitLog writer should call fdatasync instead of fsync
> -------------------------------------------------------
>                 Key: CASSANDRA-3248
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.6.13, 0.7.9, 0.8.6, 1.0.0, 1.1
>         Environment: Linux
>            Reporter: Zhu Han
>            Assignee: Brandon Williams
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> CommitLogSegment uses SequentialWriter to flush the buffered data to log device. It depends
on FileDescriptor#sync() which invokes fsync() as it force the file attributes to disk.
> However, at least on Linux, fdatasync() is good enough for commit log flush:
> bq. fdatasync() is similar to fsync(), but does not flush modified metadata unless that
metadata is needed in order to allow a subsequent data retrieval to be  correctly handled.
 For example, changes to st_atime or st_mtime (respectively, time of last access and time
of last modification; see stat(2)) do not require flushing because they are not necessary
for a subsequent data read to be handled correctly.  On the other hand, a change to the file
size (st_size,  as  made  by  say  ftruncate(2)),  would require a metadata flush.
> File size is synced to disk by fdatasync() either. Although the commit log recovery logic
sorts the commit log segements on their modify timestamp, it can be removed safely, IMHO.
> I checked the native code of JRE 6. On Linux and Solaris, FileChannel#force(false) invokes
fdatasync(). On windows, the false flag does not have any impact.
> On my log device (commodity SATA HDD, write cache disabled), there is large performance
gap between fsync() and fdatasync():
> {quote}
> $sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G --file-fsync-all=on
--file-fsync-mode={color:red}fdatasync{color} --file-test-mode=seqwr --max-time=600 --file-block-size=2K
 --max-requests=0 run
> {color:blue}54.90{color} Requests/sec executed
>    per-request statistics:
>          min:                                  8.29ms
>          avg:                                 18.18ms
>          max:                                108.36ms
>          approx.  95 percentile:              25.02ms
> $ sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G --file-fsync-all=on
--file-fsync-mode={color:red}fsync{color} --file-test-mode=seqwr --max-time=600 --file-block-size=2K
 --max-requests=0 run
> {color:blue}28.08{color} Requests/sec executed
>     per-request statistics:
>          min:                                 33.28ms
>          avg:                                 35.61ms
>          max:                                911.87ms
>          approx.  95 percentile:              41.69ms
> {quote}
> I do think this is a very critical performance improvement.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message