incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: Frugal Erlang vs Resources Hungry CouchDB
Date Thu, 30 Jun 2011 07:42:41 GMT
On Wed, Jun 29, 2011 at 19:13, Zdravko Gligic <zgligic@gmail.com> wrote:
> If these three points are more or less correct ...
>
> 1) CouchDB keeps appending to the end of the file.  Fine.
>
> 2) It needs just as much disk space when doing a compaction.  Is that
> extra space equivalent to the original uncompacted or the final
> compacted version?
>

Compacted version. In future versions of CouchDB this will be exposed
as a "data_size" (or similar, forgive me if I don't look up the name
now) attribute on the response to GET /<my_db_name>. In the past when
it has not been calculated and exposed it's been recommended to
reserve as much space as the uncompacted file since the actual data
size was unknown.

> 3) Compaction is similar to replication in that original documents'
> activities are "replayed" into the newly created DB version.
>
> Then ..
>
> a) What does CouchDB "know" during compaction that it does not know
> during the original writes - that would make it that much smarter?
>

It knows that you deleted some documents, or updated some (making the
older version obsolete). When the document was first written CouchDB
doesn't know this. CouchDB does not know the future (I hear that's on
the roadmap for 2.0, though).

Also, CouchDB can insert documents into the compacted database file in
batches, which creates less garbage. If you consider that CouchDB
needs to write a header at the end of the database file after every
write is committed, committing changes in fewer writes by batching the
changes produces fewer wasted headers. The waste is actually more
severe because the interaction between the append-only style and the
structure CouchDB uses on disk requires a lot of other metadata to be
repeatedly written and discarded as well (all the inner nodes of the
B+Tree along the path to each written document). As Paul pointed out,
there are very good reasons and some great benefits to doing things
this way, but it does use a lot of space.

> b) Are we strictly talking about reclaiming of space held by older
> revs that have been subsequently updated or is some sort of "bulking"
> at play?

Both. See my answer to (a).

>
> c) So, what about a cases in which there is next to no updating of
> existing docs and do compactions make any difference in such cases ?

Still gains to be had, just less significant.

>
> d) Is compaction similar to replication and if so then would a
> continuous replication result in continuously compacted DB ?

Similar in the way you state above. Different in two respects: (1)
while compaction will transfer documents in batches, once the
replication is "caught up" documents trickle in as they are written or
updated on the source so the benefits of batching are lost; and (2)
multiple writes to the same document will replicate so long as the
target is keeping up with each write (CouchDB will collapse multiple
edits during replication, but that, obviously, can't include edits yet
to occur in the future).

Mime
View raw message