From user-return-11069-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Fri Jun 18 19:07:23 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 68612 invoked from network); 18 Jun 2010 19:07:23 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Jun 2010 19:07:23 -0000 Received: (qmail 59056 invoked by uid 500); 18 Jun 2010 19:07:21 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 59004 invoked by uid 500); 18 Jun 2010 19:07:21 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 58996 invoked by uid 99); 18 Jun 2010 19:07:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jun 2010 19:07:21 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of norman.barker@gmail.com designates 209.85.213.180 as permitted sender) Received: from [209.85.213.180] (HELO mail-yx0-f180.google.com) (209.85.213.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jun 2010 19:07:14 +0000 Received: by yxm8 with SMTP id 8so375457yxm.11 for ; Fri, 18 Jun 2010 12:06:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=b8ybfWYJFfzTpowYZCHvB9UdU3L4+CrRyv9hDwjCKUs=; b=ilmD2WB0LJzynyTQlzRJMLI6iziRC2xbTA6hXyc9fTdmKWNn++G/OmczC+5e5MnG3N Fl/R8PapmCXqebbTtzjFHf0eBokzyP5Fe9xNApRQ2TRFCZrVdxMxvdGzTkmV1Tyx/uws g6OH+s1B/4HmusLgOXK5LWcrJO85WT89YHfKI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Y6aiL1RA2kyqrSczRrsF9kbZGF3xrFEPM1dRO4jFcPoJybHQzeiyDi6lwZUy0Q1xuo Ifkf7FqKAdy1p0RReWOgxHfDJDKgBQobUacePXhoT+CXsSjdjedOEhAt3OUHcdrGe6Lo gtrJyeK6ZOLI6EtRqMWTR32CIGf0N7SYGNOzA= MIME-Version: 1.0 Received: by 10.150.131.11 with SMTP id e11mr1478597ybd.270.1276888013430; Fri, 18 Jun 2010 12:06:53 -0700 (PDT) Received: by 10.150.196.1 with HTTP; Fri, 18 Jun 2010 12:06:53 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 Jun 2010 13:06:53 -0600 Message-ID: Subject: Re: using gzip for db and view indexes From: Norman Barker To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Adam, I agree, as we grow our system we are probably going to want to compression in some cases, I will look into this by making the changes in couch_file as you suggest and report back. Norman On Fri, Jun 18, 2010 at 5:27 AM, Adam Kocoloski wrote= : > On Jun 17, 2010, at 6:00 PM, Norman Barker wrote: > >> Hi, >> >> I am looking at the couchdb db database and view index directory and I >> see the files are saved as binary, my indexes and database are getting >> fairly large so I tried gzipping them (by hand) and it made a big >> difference (at least for my data). >> >> Looking at >> >> http://www.erlang.org/doc/man/file.html >> >> I see that compressed is an option when reading or writing a file, is >> it worth trying this out, could it be an option in the ini file so we >> could trade off database size versus a possible lag in access? >> >> I can do look into this, does everything go through the couch_file >> module and is there a suitable test dataset that we can analyse >> performance with? >> >> thanks, >> >> Norman > > Hi Norman, I'd support making gzip compression a config option. =A0Yes, e= verything goes through couch_file, so adding a flag to the term_to_binary c= alls in append_term and append_term_md5 would get you there. > > You should search the archives for a discussion about this. =A0We used to= compress the terms, and IIRC it almost cut the file size in half. =A0Howev= er, it also introduced a measurable drop in write throughput. =A0That's a t= radeoff I'm sure some folks would be willing to make. > > One other interesting thing to investigate might be to have separate comp= ression settings for document bodies and btree nodes. =A0It could be that o= ne compresses more effectively than the other. =A0Best, > > Adam > >