couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@iriscouch.com>
Subject Re: Size of couchdb documents
Date Thu, 15 Mar 2012 23:55:40 GMT
On Thu, Mar 15, 2012 at 10:14 PM, Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
> Hi Matthieu,
>
> This really seems to help. I am using now a base62 encoded monotonically
> increasing integer, which means my doc_id goes from "0" onwards, using the
> alphabet:
>
> ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz
>
> I am getting now 3000 docs/s, more or less stable, and the size of my
> documents has decreased from 3KB to 0.4 KB.
> I am not sure whether this metrics will worsen when the database grows, but
> my feeling is that the situation has improved a lot just by changing the
> doc_id.

Hi, Daniel. That's great news! Also, I have an update from a CouchDB 1.2.0 test.

I have a database here with 10 million documents, most several KB of
English text. upgrade to version 1.2 changed the database size from
38GB to is 9.2GB, or now 0.94 KB per document.

So you should see an even greater improvement when 1.2.0 comes out
Real Soon Now.

> I have one more question. Is the alphabet I have shown above "ordered" for
> couchdb?

The sort order may not be quite what you expect, especially if you
work with Unix or servers a lot.

It is described here:
http://wiki.apache.org/couchdb/View_collation#Collation_Specification

Basically CouchDB follows (uses!) ICU. The major point is that
different letter sequences are compared case-insensitively, but
same-letter strings are case sensitive (lower case first). To me, it
more or less follows how an English dictionary would do it.

-- 
Iris Couch

Mime
View raw message