Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 646DC4AC6 for ; Thu, 30 Jun 2011 02:01:04 +0000 (UTC) Received: (qmail 95443 invoked by uid 500); 30 Jun 2011 02:01:02 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 95356 invoked by uid 500); 30 Jun 2011 02:01:01 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 95343 invoked by uid 99); 30 Jun 2011 02:01:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jun 2011 02:01:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wanpeebaw@gmail.com designates 209.85.215.52 as permitted sender) Received: from [209.85.215.52] (HELO mail-ew0-f52.google.com) (209.85.215.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jun 2011 02:00:53 +0000 Received: by ewy28 with SMTP id 28so897648ewy.11 for ; Wed, 29 Jun 2011 19:00:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ZPjkEf5F+SJD+6WxC5VqzSmRrBZ7uOlfkuECKqB5ozg=; b=voGJONWbefB2WuZdequjxgoCpQgTZ4ujgmd1xMVXZhGyCgejpw/AEnitXVbMaUkAle OQVpvOk8Ych52DCw1jxhZPOTEvcYq2tK0eS/qYdk68mntq4zaxMR/h7c4b7p7xLAmvkF //RowTWhXk6L7IWVPwZlnuiWBsKTBcaABKri4= MIME-Version: 1.0 Received: by 10.213.3.84 with SMTP id 20mr441238ebm.12.1309399233406; Wed, 29 Jun 2011 19:00:33 -0700 (PDT) Received: by 10.213.9.3 with HTTP; Wed, 29 Jun 2011 19:00:33 -0700 (PDT) In-Reply-To: References: Date: Thu, 30 Jun 2011 10:00:33 +0800 Message-ID: Subject: Re: Frugal Erlang vs Resources Hungry CouchDB From: sleepnova To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=0015174c17d82d262604a6e444cb X-Virus-Checked: Checked by ClamAV on apache.org --0015174c17d82d262604a6e444cb Content-Type: text/plain; charset=UTF-8 I think what many people really concerned is the growing pattern of size as number of docs increase. (space complexity) (If it grows exponentially then that's not a good sign.) So is there any official/non-official, theoretically/benchmark showing this characteristic? 2011/6/30 Paul Davis > > Teslan, > > I'm not sure where you were getting the impression that Erlang was > frugal with disk space. In general, its true that Erlang is pretty > good at using a minimal amount of CPU/RAM resources while it runs, > though as in all things, that usage will scale with load. > > As to disk usage, that's a direct trade off in the design of CouchDB. > The append only b+tree is going to cause fragmentation in the database > files. There are of course games we could play to minimize to a > certain extent by doing things like log structured merge trees with > more aggressive compaction but then the issue becomes that we end up > requiring more active file descriptors per database which in turn > hurts people that are hosting a large number of databases on a single > node (think hosting, or db per user account). > > My guess that whoever it was on IRC was just speaking with conviction. > We don't try and hide the fact that CouchDB uses quite a bit more > space than people would expect at first by any means. > > As to the amount of space that can be cleaned up, it really depends on > the specific load patterns and how aggressive people are at keeping > the database files compacted. Obviously I could write a single > document hundreds of thousands of times without compacting, and then > compact and have a database that is a percent or less of the > "uncompacted" size. > > I'm also not sure about why someone would say that a 2GiB database > would struggle with less than 2GiB of RAM. RAM usage is more or less > tied to the number of concurrent clients you have accessing the > database and the amount and type of view generations you have running. > Its not really tied to the physical size of the database as we don't > hold caches to anything. There used to be a silly benchmark floating > around that showed CouchDB handling a couple thousand requests for a > small doc and it was only using 9M of RAM. Granted that's a super > idealized case, but I'd just point out that it's more about access > patterns rather than disk usage. > > As to the mobile stuff, my guess would probably be "don't store a lot > of data on the device". AFAIK the story for mobile developers revolves > quite a bit around the fact that replicating data in and out from The > Cloud ™ makes it super easy for them to have bits and pieces of > a marge larger database. > > But in the end, the fact that CouchDB has a much larger disk usage > size than some would expect is that's the trade off in the grand > design. There are features we have like database snapshots, append > only storage to simplify guarantees on consistency (also, hot backups) > and hosting a large number of db's in a single Erlang VM that end up > intersecting in such a way that the price we pay is using more bytes. > > Also, I'd like to recommend you keep an eye on development because > this is an active area of optimization. Filipe has been doing awesome > work integrating things like snappy compression and other things deep > down at the storage layer to improve the situation. We may be frank in > saying we use a non-trivial amount of extra space, but its not like > we're not working on improving that situation. :D > > That ended up longer than expected. Let us know if you have any other > questions. > -- - sleepnova --0015174c17d82d262604a6e444cb--