incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick North <nort...@gmail.com>
Subject Re: Replicated database size
Date Wed, 16 May 2012 14:49:09 GMT
Following up on my own email, this seems to be an issue with snappy on
Windows Server 2008. When I changed the file_compression setting to
deflate_6, the "large" databases went down from 7GB to 1GB after
compaction. I'm not entirely sure if this counts as a bug so I won't raise
an issue on it.
By the way: kudos to whoever wrote the code to deal with file_compression.
When I changed file_compression to deflate_6, the system happily worked
with the existing, supposedly snappy-compressed databases, and converted
format on the next compaction. That could have gone wrong in several ways,
but didn't, so thank you.

Nick
On 15 May 2012 13:55, Nick North <north.n@gmail.com> wrote:

> I'm curious about the size of replicated CouchDb databases in comparison
> to each other. I have four databases, each with pull replications from the
> other three, but they report quite different data sizes. Two of them say:
>
> {"db_name":"hydra","doc_count":1489060,"doc_del_count":2754893,"update_seq":6998882,"purge_seq":0,"compact_running":false,"disk_size":3213656193,"data_size":1395943755,"instance_start_time":"1337067567481841","disk_format_version":6,"committed_update_seq":6998882}
>
> While the other two say this - note the difference in data_size:
>
> {"db_name":"hydra","doc_count":1489441,"doc_del_count":2755302,"update_seq":4375865,"purge_seq":0,"compact_running":false,"disk_size":7599413027,"data_size":7265993199,"instance_start_time":"1337014746154865","disk_format_version":6,"committed_update_seq":4375865}
>
> (There is some discrepancy in the doc_count because new documents are being posted continuously,
and some went in in between fetching stats for the various instances.) Other possibly relevant
information:
>
>
>    - All the replications appear to be in working order so I don't believe there is a
backlog of documents waiting to be replicated.
>    - The database has just one design view and whether or not it has been queried does
not seem to make any difference to whether the database is "large" or "small".
>    - Compaction makes little difference, in that the "large" instances always remain
much larger than the "small" ones.
>    - Everything is running CouchDb 1.2 on Windows: the "small" instances on Windows 7
and Windows Vista, and the "large" ones on Windows Server 2008.
>    - File_compression is set to "snappy" in all cases and there are no attachments anywhere.
>
> Can anyone suggest what might be going on here? My best guess is that it's to do with
file compression on Windows Server but that is a guess, so I'm intending to do some experimentation
with the other file compression options. I'd be grateful for any thoughts, as I'm planning
out disk requirements for a system with ten times the capacity of the current one, and would
very much like to be do that with some certainty about file sizes. Thanks in advance for any
help,
>
> Nick North
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message