Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CF90F9509 for ; Fri, 16 Mar 2012 09:11:38 +0000 (UTC) Received: (qmail 40217 invoked by uid 500); 16 Mar 2012 09:11:37 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 39923 invoked by uid 500); 16 Mar 2012 09:11:35 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 39864 invoked by uid 99); 16 Mar 2012 09:11:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Mar 2012 09:11:35 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [81.169.146.160] (HELO mo-p00-ob.rzone.de) (81.169.146.160) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Mar 2012 09:11:27 +0000 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; t=1331889066; l=4383; s=domk; d=gonvaled.com; h=Content-Type:To:Subject:Date:From:References:In-Reply-To: MIME-Version:X-RZG-CLASS-ID:X-RZG-AUTH; bh=iXyh6YxWF/N4OcEyd/6c88KDU/o=; b=ley3Qea0dSiikFQgHqRcCkwWntEyLctSYd1ouA9IUvxFt7XIlZlORuaWRKBXDRxN5Bq VYiEeN2hWaQDD+E8IR9sXxGUa+O3FFZAGcAEP9Ke9Zvcv858ltNhCa5FY9JUt417OpV1/ MezuZvJyZcsUI4rZN/vrWjDLQE9AuMn02lw= X-RZG-AUTH: :K2MKY0GkfvuAYI9OvLYEA55J0qvTZZULi9CTHjqnn8/d41Z9VA5z1TAdjxyFQvE= X-RZG-CLASS-ID: mo00 Received: from mail-yw0-f52.google.com ([209.85.213.52]) by post.strato.de (mrclete mo27) (RZmta 28.1 AUTH) with ESMTPA id k01651o2G8rXvd for ; Fri, 16 Mar 2012 10:11:06 +0100 (MET) Received: by yhpp61 with SMTP id p61so4885466yhp.11 for ; Fri, 16 Mar 2012 02:11:05 -0700 (PDT) Received: by 10.236.184.167 with SMTP id s27mr1973434yhm.8.1331889065369; Fri, 16 Mar 2012 02:11:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.147.9.11 with HTTP; Fri, 16 Mar 2012 02:10:44 -0700 (PDT) In-Reply-To: References: From: Daniel Gonzalez Date: Fri, 16 Mar 2012 10:10:44 +0100 Message-ID: Subject: Re: Size of couchdb documents To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=20cf303f6acc9f219804bb5896ce X-Virus-Checked: Checked by ClamAV on apache.org --20cf303f6acc9f219804bb5896ce Content-Type: text/plain; charset=ISO-8859-1 > > Hi, Daniel. That's great news! Also, I have an update from a CouchDB 1.2.0 > test. > > I have a database here with 10 million documents, most several KB of > English text. upgrade to version 1.2 changed the database size from > 38GB to is 9.2GB, or now 0.94 KB per document. > That is interesting. Is CouchDB reducing the size of your stored data? Compression? Or is the average size of your input data smaller than 0.94KB? (I am not sure what "most several KB" means) > > So you should see an even greater improvement when 1.2.0 comes out > Real Soon Now. > > > I have one more question. Is the alphabet I have shown above "ordered" > for > > couchdb? > > The sort order may not be quite what you expect, especially if you > work with Unix or servers a lot. > > It is described here: > http://wiki.apache.org/couchdb/View_collation#Collation_Specification > > Basically CouchDB follows (uses!) ICU. The major point is that > different letter sequences are compared case-insensitively, but > same-letter strings are case sensitive (lower case first). To me, it > more or less follows how an English dictionary would do it. > > -- > Iris Couch > I have now changed my encoding dictionary to: "-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ" As suggested by Jamie Talbot. That seems to be ordered in the ICU (or UCA?) sense. Regarding size of documents, having now nearly 20 millions of documens, and 7.4GB, I can defenitely say that the situation has indeed improved a lot. I have now 400 bytes/doc, down from originally 3KB/doc. --20cf303f6acc9f219804bb5896ce--