Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D51939659 for ; Sun, 30 Oct 2011 00:13:54 +0000 (UTC) Received: (qmail 80872 invoked by uid 500); 30 Oct 2011 00:13:54 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 80837 invoked by uid 500); 30 Oct 2011 00:13:54 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 80828 invoked by uid 99); 30 Oct 2011 00:13:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Oct 2011 00:13:54 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Oct 2011 00:13:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 613EE327197 for ; Sun, 30 Oct 2011 00:13:32 +0000 (UTC) Date: Sun, 30 Oct 2011 00:13:32 +0000 (UTC) From: "Randall Leeds (Commented) (JIRA)" To: dev@couchdb.apache.org Message-ID: <15911396.37605.1319933612399.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (COUCHDB-754) Improve couch_file write performance MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COUCHDB-754?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D131= 39501#comment-13139501 ]=20 Randall Leeds commented on COUCHDB-754: --------------------------------------- Jan: We'd always love things to be faster and there's a lot of specific inf= o in this ticket. I'd leave it open. =20 > Improve couch_file write performance > ------------------------------------ > > Key: COUCHDB-754 > URL: https://issues.apache.org/jira/browse/COUCHDB-754 > Project: CouchDB > Issue Type: Improvement > Affects Versions: 1.0, 1.0.1 > Environment: some code might be platform-specific > Reporter: Adam Kocoloski > Attachments: cheaper-appending-v2.patch, cheaper-appending.patch > > > I've got a number of possible enhancements to couch_file floating around = in my head, wanted to write them down. > * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file = driver [1] that adds a new file:datasync/1 function. I suspect that we won= 't see much of a performance gain from this switch because we append to the= file and thus need to update the file metedata anyway. On the other hand,= I'm fairly certain fdatasync is always safe for our needs, so if it is eve= r more efficient we should use it. Obviously, we'll need to fall back to f= ile:sync/1 on platforms where the datasync function is not available. > * Use file:pwrite/2 to batch together multiple outstanding write requests= . This is essentially Paul's zip_server [2]. In order to take full advant= age of it we need to patch couch_btree to update nodes in parallel. Curren= tly there should only be 1 outstanding write request in a couch_file at a t= ime, so it wouldn't help at all. > * Open the file in append mode and stop seeking to eof in user space. We= never modify files (aside from truncating, which is rare enough to be hand= led separately), so perhaps it would help with performance if we let the ke= rnel deal with the seek. We'd still need a way to get the file size for th= e make_blocks function. I'm wondering if file:read_file_info(Fd) is more e= fficient than file:position(Fd, eof) for this purpose. > A caveat - I'm not sure if append-only files are compatible with the prev= ious enhancement. There is no file:write/2, and I have no idea how file:pw= rite behaves on a file which is opened append-only. Is the Pos ignored, or= is it an error? Will have to test. > * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antire= z' recent blog post [3] and some historical discussions on pgsql-performanc= e. Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linu= x, which is currently the same thing) and doing all synchronous writes is r= easonably fast. Antirez' tests showed 250 =C2=B5s delays for (tiny) synchr= onous writes, compared to 40 ms delays for fsync and fdatasync on his ext4 = system. > At the very least, this looks to be a compelling choice for file access w= hen the server is running with delayed_commits =3D true. We'd need to patc= h the OTP file driver again, and also investigate the cross-platform suppor= t. In particular, I don't think it works on NFS. > [1]: http://github.com/fdmanana/otp/tree/fdatasync > [2]: http://github.com/davisp/zip_server > [3]: http://antirez.com/post/fsync-different-thread-useless.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira