incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-754) Improve couch_file write performance
Date Wed, 05 May 2010 20:19:03 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864486#action_12864486
] 

Adam Kocoloski commented on COUCHDB-754:
----------------------------------------

I wrote a fast and dirty patch to set the O_DSYNC flag via couch_icu_driver instead of calling
fsync.  I'll submit a cleaned-up version that only activates when delayed_commits = false
later.  Here are the results of a relaximation writer comparison test against trunk@940992:

http://mikeal.couchone.com/graphs/_design/app/_show/compareWriteTest/c34d5d47f99e11be1f591832d0004d64

So clearly the O_DSYNC approach is much faster than calling file:sync/1 after every write.
 I confirmed that the fcntl actually did have an effect; append_bin operations with O_DSYNC
set were taking ~600 µs as opposed to ~100 µs without the flag on trunk.

I have no idea what kind of data integrity guarantees we get with O_DSYNC on OS X.  Is it
equivalent to an fsync(), or to an fcntl(F_FULLFSYNC)?  If its equivalent to an fcntl(F_FULLFSYNC)
this is a no-brainer.  It's also a no-brainer on Linux.

> Improve couch_file write performance
> ------------------------------------
>
>                 Key: COUCHDB-754
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-754
>             Project: CouchDB
>          Issue Type: Improvement
>         Environment: some code might be platform-specific
>            Reporter: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: cheaper-appending-v2.patch, cheaper-appending.patch
>
>
> I've got a number of possible enhancements to couch_file floating around in my head,
wanted to write them down.
> * Use fdatasync instead of fsync.  Filipe posted a patch to the OTP file driver [1] that
adds a new file:datasync/1 function.  I suspect that we won't see much of a performance gain
from this switch because we append to the file and thus need to update the file metedata anyway.
 On the other hand, I'm fairly certain fdatasync is always safe for our needs, so if it is
ever more efficient we should use it.  Obviously, we'll need to fall back to file:sync/1 on
platforms where the datasync function is not available.
> * Use file:pwrite/2 to batch together multiple outstanding write requests.  This is essentially
Paul's zip_server [2].  In order to take full advantage of it we need to patch couch_btree
to update nodes in parallel.  Currently there should only be 1 outstanding write request in
a couch_file at a time, so it wouldn't help at all.
> * Open the file in append mode and stop seeking to eof in user space.  We never modify
files (aside from truncating, which is rare enough to be handled separately), so perhaps it
would help with performance if we let the kernel deal with the seek.  We'd still need a way
to get the file size for the make_blocks function.  I'm wondering if file:read_file_info(Fd)
is more efficient than file:position(Fd, eof) for this purpose.
> A caveat - I'm not sure if append-only files are compatible with the previous enhancement.
 There is no file:write/2, and I have no idea how file:pwrite behaves on a file which is opened
append-only.  Is the Pos ignored, or is it an error?  Will have to test.
> * Use O_DSYNC instead of fsync/fdatasync.  This one is inspired by antirez' recent blog
post [3] and some historical discussions on pgsql-performance.  Basically, it seems that opening
a file with O_DSYNC (or O_SYNC on Linux, which is currently the same thing) and doing all
synchronous writes is reasonably fast.  Antirez' tests showed 250 µs delays for (tiny) synchronous
writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system.
> At the very least, this looks to be a compelling choice for file access when the server
is running with delayed_commits = true.  We'd need to patch the OTP file driver again, and
also investigate the cross-platform support.  In particular, I don't think it works on NFS.
> [1]: http://github.com/fdmanana/otp/tree/fdatasync
> [2]: http://github.com/davisp/zip_server
> [3]: http://antirez.com/post/fsync-different-thread-useless.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message