incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gonzalez <gonva...@gonvaled.com>
Subject Re: Size of couchdb documents
Date Fri, 16 Mar 2012 09:10:44 GMT
>
> Hi, Daniel. That's great news! Also, I have an update from a CouchDB 1.2.0
> test.
>
> I have a database here with 10 million documents, most several KB of
> English text. upgrade to version 1.2 changed the database size from
> 38GB to is 9.2GB, or now 0.94 KB per document.
>

That is interesting. Is CouchDB reducing the size of your stored data?
Compression? Or is the average size of your input data smaller than 0.94KB?
(I am not sure what "most several KB" means)


>
> So you should see an even greater improvement when 1.2.0 comes out
> Real Soon Now.
>
> > I have one more question. Is the alphabet I have shown above "ordered"
> for
> > couchdb?
>
> The sort order may not be quite what you expect, especially if you
> work with Unix or servers a lot.
>
> It is described here:
> http://wiki.apache.org/couchdb/View_collation#Collation_Specification
>
> Basically CouchDB follows (uses!) ICU. The major point is that
> different letter sequences are compared case-insensitively, but
> same-letter strings are case sensitive (lower case first). To me, it
> more or less follows how an English dictionary would do it.
>
> --
> Iris Couch
>

I have now changed my encoding dictionary to:

"-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ"

As suggested by Jamie Talbot. That seems to be ordered in the ICU (or UCA?)
sense.

Regarding size of documents, having now nearly 20 millions of documens, and
7.4GB, I can defenitely say that the situation has indeed improved a lot. I
have now 400 bytes/doc, down from originally 3KB/doc.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message