From user-return-19299-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Sat Dec 31 00:03:09 2011 Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A2FA9B18 for ; Sat, 31 Dec 2011 00:03:09 +0000 (UTC) Received: (qmail 44347 invoked by uid 500); 31 Dec 2011 00:03:07 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 44284 invoked by uid 500); 31 Dec 2011 00:03:07 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 44276 invoked by uid 99); 31 Dec 2011 00:03:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Dec 2011 00:03:07 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of wickedgrey@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-we0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Dec 2011 00:03:02 +0000 Received: by werp11 with SMTP id p11so9445549wer.11 for ; Fri, 30 Dec 2011 16:02:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=+ondAtrzo88uHRStT3AoUoplWR8sToNRh4O1yhm+BMI=; b=kenf4FXDi7xpnjKFMf6+cTKTD2uow6YfBgXtnEH+MN9+Z6DSykYqxW40BS6C64S9q9 4LfzcwQg+4pJ1Se8NwtUSGCG+6ABzd/2uZD6hi77phGKza833j5+1GjHzFhelVVe1VJv NBXIAsRXMqLIrk/QEbVslMuwEaxvdwT4Dxj5E= MIME-Version: 1.0 Received: by 10.216.137.216 with SMTP id y66mr85221wei.43.1325289760833; Fri, 30 Dec 2011 16:02:40 -0800 (PST) Received: by 10.227.36.3 with HTTP; Fri, 30 Dec 2011 16:02:40 -0800 (PST) Date: Fri, 30 Dec 2011 16:02:40 -0800 Message-ID: Subject: Attachment performance testing script From: "Eli Stevens (Gmail)" To: user Content-Type: text/plain; charset=ISO-8859-1 I've been doing some performance testing of the various ways that attachments can be uploaded to CouchDB. I think that what I'm seeing points to some pathological behavoir inside couch, but that's just a guess (I don't really know anything about couch internals). However, if I'm understanding the implications correctly, there might be the possibility to make replication much, much faster for large attachments (by speeding up the multipart API). To get the data yourself, run 'python makedata.py' once, and then repeatedly run 'bash do-curls.sh' to get timing information (perhaps while making performance tweaks, if you're a dev). Code is on github: https://github.com/wickedgrey/couchdb-attachment-speed It's a bit janky, but gets the job done. The main takeaway: the multipart API is just as slow as base64 encoding everything. Expect to pay roughly a 10x performance penalty for using either api vs. uploading the attachment separately. All of the tests were run against a local 1.1.1 couch recently installed via brew with delayed commits set to false. Hardware was a 2010 macbook pro w/ 8GB of ram, lightly loaded (browser and IDE running but idle at the same time as the tests were run). The general shape of the timing data didn't change over multiple runs. I haven't looked into couch memory or cpu usage while handling the uploads. n raw base64 multipart py b64 encode py b64 decode 1 0m0.136s 0m0.014s 0m0.013s 0:00:00.000015 0:00:00.000009 2 0m0.014s 0m0.016s 0m0.015s 0:00:00.000012 0:00:00.000011 3 0m0.015s 0m0.017s 0m1.027s 0:00:00.000016 0:00:00.000021 4 0m0.015s 0m0.018s 0m2.020s 0:00:00.000057 0:00:00.000090 5 0m0.017s 0m0.035s 0m2.027s 0:00:00.000361 0:00:00.000801 6 0m0.054s 0m0.202s 0m1.133s 0:00:00.003541 0:00:00.005455 7 0m0.361s 0m1.859s 0m2.318s 0:00:00.043847 0:00:00.059307 8 0m3.531s 0m19.336s 0m15.820s 0:00:00.472431 0:00:00.822210 9 0m36.594s 3m24.152s 5m45.110s ? ? One of the interesting issues that I ran into when working on constructing the data was with trying to run a gig of text data through the python JSON parser. It seemed that there were a couple copies of the data being made (I'd guess the original data, then an escaped version, and then the final string?) which slowed things down quite a bit. The current state of affairs is especially frustrating for me, since my use case doesn't permit having documents in an attachment-less (read: inconsistent) state. My ideal case would to have the multipart API: - Sped up to be roughly the same speed as standalone attachments - Extended/changed/supplemented to allow for multiple documents at once, like the bulk API. In any case, thanks for reading. I hope this helps make CouchDB even better. :) Cheers, Eli