-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Roger Binns wrote:
> I'll do the requisite experiments this weekend trying to see what has the
> most effect on file size
And the answer is length. It is quicker to add documents with sequential
(sorted) _ids. The length of the _id field has an effect on the final file
size and appears to be more than a multiple of the _id size as suggested in
earlier messages. Somewhat amusingly compaction increased file sizes and
not by a trivial amount either.
To measure this, I wrote a simple Python script that created 65536 documents
with a 4 byte hex id, and then tried again padding the _id with zeros to get
8 and 16 byte, plus doing various other permutations. It is an
embarrassingly small script (and likely just as small in other languages).
[Sorry for not publishing the script - BitBucket and I are having some
mutual hatred issues at the moment.]
The relationship between _id size, sparseness, file size and performance is
now better approached by someone with an understanding of the file format.
I've also started this page to help:
http://wiki.apache.org/couchdb/Performance
Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAktKhfMACgkQmOOfHg372QTlTQCdEawiNcqJVtHOjK61OsQNhtd+
P2gAn1gVXeknm4mfU74RlZid1+kI59dh
=RPB7
-----END PGP SIGNATURE-----
|