Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 84225 invoked from network); 9 Oct 2010 19:45:27 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Oct 2010 19:45:27 -0000 Received: (qmail 27518 invoked by uid 500); 9 Oct 2010 19:45:21 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 27455 invoked by uid 500); 9 Oct 2010 19:45:21 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 27425 invoked by uid 99); 9 Oct 2010 19:45:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Oct 2010 19:45:21 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Oct 2010 19:45:17 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o99JiuZG016586 for ; Sat, 9 Oct 2010 19:44:56 GMT Message-ID: <258209.56551286653496430.JavaMail.jira@thor> Date: Sat, 9 Oct 2010 15:44:56 -0400 (EDT) From: "Paul Joseph Davis (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Updated: (COUCHDB-754) Improve couch_file write performance In-Reply-To: <24914376.14391272896815259.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-754?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis updated COUCHDB-754: -------------------------------------- Skill Level: Regular Contributors Level (Easy to Medium) > Improve couch_file write performance > ------------------------------------ > > Key: COUCHDB-754 > URL: https://issues.apache.org/jira/browse/COUCHDB-754 > Project: CouchDB > Issue Type: Improvement > Environment: some code might be platform-specific > Reporter: Adam Kocoloski > Fix For: 1.1 > > Attachments: cheaper-appending-v2.patch, cheaper-appending.patch > > > I've got a number of possible enhancements to couch_file floating around = in my head, wanted to write them down. > * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file = driver [1] that adds a new file:datasync/1 function. I suspect that we won= 't see much of a performance gain from this switch because we append to the= file and thus need to update the file metedata anyway. On the other hand,= I'm fairly certain fdatasync is always safe for our needs, so if it is eve= r more efficient we should use it. Obviously, we'll need to fall back to f= ile:sync/1 on platforms where the datasync function is not available. > * Use file:pwrite/2 to batch together multiple outstanding write requests= . This is essentially Paul's zip_server [2]. In order to take full advant= age of it we need to patch couch_btree to update nodes in parallel. Curren= tly there should only be 1 outstanding write request in a couch_file at a t= ime, so it wouldn't help at all. > * Open the file in append mode and stop seeking to eof in user space. We= never modify files (aside from truncating, which is rare enough to be hand= led separately), so perhaps it would help with performance if we let the ke= rnel deal with the seek. We'd still need a way to get the file size for th= e make_blocks function. I'm wondering if file:read_file_info(Fd) is more e= fficient than file:position(Fd, eof) for this purpose. > A caveat - I'm not sure if append-only files are compatible with the prev= ious enhancement. There is no file:write/2, and I have no idea how file:pw= rite behaves on a file which is opened append-only. Is the Pos ignored, or= is it an error? Will have to test. > * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antire= z' recent blog post [3] and some historical discussions on pgsql-performanc= e. Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linu= x, which is currently the same thing) and doing all synchronous writes is r= easonably fast. Antirez' tests showed 250 =C2=B5s delays for (tiny) synchr= onous writes, compared to 40 ms delays for fsync and fdatasync on his ext4 = system. > At the very least, this looks to be a compelling choice for file access w= hen the server is running with delayed_commits =3D true. We'd need to patc= h the OTP file driver again, and also investigate the cross-platform suppor= t. In particular, I don't think it works on NFS. > [1]: http://github.com/fdmanana/otp/tree/fdatasync > [2]: http://github.com/davisp/zip_server > [3]: http://antirez.com/post/fsync-different-thread-useless.html --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.