Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 73397 invoked from network); 3 May 2010 14:53:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 14:53:21 -0000 Received: (qmail 28233 invoked by uid 500); 3 May 2010 14:53:20 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 28189 invoked by uid 500); 3 May 2010 14:53:20 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 28181 invoked by uid 99); 3 May 2010 14:53:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 14:53:20 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 14:53:17 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o43EqtAn025546 for ; Mon, 3 May 2010 14:52:55 GMT Message-ID: <32534055.14751272898375220.JavaMail.jira@thor> Date: Mon, 3 May 2010 10:52:55 -0400 (EDT) From: "Adam Kocoloski (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Commented: (COUCHDB-754) Investigate alternative couch_file writer implementations In-Reply-To: <24914376.14391272896815259.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-754?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D128= 63371#action_12863371 ]=20 Adam Kocoloski commented on COUCHDB-754: ---------------------------------------- Bah, another correction -- I meant that opening files with O_SYNC would be = a compelling choice for "delayed_commits =3D false", of course. > Investigate alternative couch_file writer implementations > --------------------------------------------------------- > > Key: COUCHDB-754 > URL: https://issues.apache.org/jira/browse/COUCHDB-754 > Project: CouchDB > Issue Type: Improvement > Environment: some code might be platform-specific > Reporter: Adam Kocoloski > Fix For: 1.1 > > > I've got a number of possible enhancements to couch_file floating around = in my head, wanted to write them down. > * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file = driver [1] that adds a new file:datasync/1 function. I suspect that we won= 't see much of a performance gain from this switch because we append to the= file and thus need to update the file metedata anyway. On the other hand,= I'm fairly certain fdatasync is always safe for our needs, so if it is eve= r more efficient we should use it. Obviously, we'll need to fall back to f= ile:sync/1 on platforms where the datasync function is not available. > * Use file:pwrite/2 to batch together multiple outstanding write requests= . This is essentially Paul's zip_server [2]. In order to take full advant= age of it we need to patch couch_btree to update nodes in parallel. Curren= tly there should only be 1 outstanding write request in a couch_file at a t= ime, so it wouldn't help at all. > * Open the file in append mode and stop seeking to eof in user space. We= never modify files (aside from truncating, which is rare enough to be hand= led separately), so perhaps it would help with performance if we let the ke= rnel deal with the seek. We'd still need a way to get the file size for th= e make_blocks function. I'm wondering if file:read_file_info(Fd) is more e= fficient than file:position(Fd, eof) for this purpose. > A caveat - I'm not sure if append-only files are compatible with the prev= ious enhancement. There is no file:write/2, and I have no idea how file:pw= rite behaves on a file which is opened append-only. Is the Pos ignored, or= is it an error? Will have to test. > * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antire= z' recent blog post [3] and some historical discussions on pgsql-performanc= e. Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linu= x, which is currently the same thing) and doing all synchronous writes is r= easonably fast. Antirez' tests showed 250 =C2=B5s delays for (tiny) synchr= onous writes, compared to 40 ms delays for fsync and fdatasync on his ext4 = system. > At the very least, this looks to be a compelling choice for file access w= hen the server is running with delayed_commits =3D true. We'd need to patc= h the OTP file driver again, and also investigate the cross-platform suppor= t. In particular, I don't think it works on NFS. > [1]: http://github.com/fdmanana/otp/tree/fdatasync > [2]: http://github.com/davisp/zip_server > [3]: http://antirez.com/post/fsync-different-thread-useless.html --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.