couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: svn commit: r897509 - /couchdb/trunk/etc/couchdb/
Date Tue, 12 Jan 2010 04:36:31 GMT
On Mon, Jan 11, 2010 at 10:51 PM, Roger Binns <> wrote:
> Hash: SHA1
> Robert Newson wrote:
>> Unless I misread you, you are implying that _id is stored increasingly
>> less inefficiently in the .couch file as its length increases? I don't
>> think, unless you've really dug into the disk structure, that this
>> assertions will hold.
> I don't have enough data sets (or math background) to work out the exact
> relationship.  At the simplest level adding one byte to the _id length
> results in more than num_documents*1 bytes increase in file size.  It at
> least doubles since the _id is also stored in a btree node.  And in my tests
> it appears to be more than double but I don't see an exact formula since it
> presumably depends on other factors as well such as the "more
> nodes/turnover" you mention.
> At the simplest level when using a non-trivial number of documents with
> CouchDB it is a bad idea to use long ids.  Shorter ones result in a lot less
> disk space being consumed and hence more I/O, longer replication times etc.
>  I assume the _id keys are also included in views so again each byte in _id
> length is used a multiple of times.
> Roger
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla -
> iEYEARECAAYFAktL8bwACgkQmOOfHg372QRnRwCfYyKmrxkNgvT7uCMzDA8a9E7c
> +HIAnjnFUYNeB36jztdtDS//8ldMAwqS
> =BaLY

I reckon that a longer _id is going to result in greater than linear
storage requirement. There's a function class I can't remember that's
roughly related to this. Its quite tied into other things like
randomness and other bits so its hard to say for certain in math terms
what the exact effects would be.

Paul Davis

View raw message