couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Frugal Erlang vs Resources Hungry CouchDB
Date Thu, 30 Jun 2011 09:15:05 GMT
I'd say the essential thing that CouchDB "knows" during compaction
that it does not know earlier is *your documents*.

CouchDB *must* write an updated btree and an updated header to point
to the root of that btree every time you update a document, or it will
be lost if couch crashed right then. When couch is compacting, it can
crash with immunity, as your documents are stored on disk and can be
retried. As Randall said, it's batching. We cannot batch during normal
inserts without reducing durability to unacceptable levels for a
database.

B.

On 30 June 2011 08:42, Randall Leeds <randall.leeds@gmail.com> wrote:
> On Wed, Jun 29, 2011 at 19:13, Zdravko Gligic <zgligic@gmail.com> wrote:
>> If these three points are more or less correct ...
>>
>> 1) CouchDB keeps appending to the end of the file.  Fine.
>>
>> 2) It needs just as much disk space when doing a compaction.  Is that
>> extra space equivalent to the original uncompacted or the final
>> compacted version?
>>
>
> Compacted version. In future versions of CouchDB this will be exposed
> as a "data_size" (or similar, forgive me if I don't look up the name
> now) attribute on the response to GET /<my_db_name>. In the past when
> it has not been calculated and exposed it's been recommended to
> reserve as much space as the uncompacted file since the actual data
> size was unknown.
>
>> 3) Compaction is similar to replication in that original documents'
>> activities are "replayed" into the newly created DB version.
>>
>> Then ..
>>
>> a) What does CouchDB "know" during compaction that it does not know
>> during the original writes - that would make it that much smarter?
>>
>
> It knows that you deleted some documents, or updated some (making the
> older version obsolete). When the document was first written CouchDB
> doesn't know this. CouchDB does not know the future (I hear that's on
> the roadmap for 2.0, though).
>
> Also, CouchDB can insert documents into the compacted database file in
> batches, which creates less garbage. If you consider that CouchDB
> needs to write a header at the end of the database file after every
> write is committed, committing changes in fewer writes by batching the
> changes produces fewer wasted headers. The waste is actually more
> severe because the interaction between the append-only style and the
> structure CouchDB uses on disk requires a lot of other metadata to be
> repeatedly written and discarded as well (all the inner nodes of the
> B+Tree along the path to each written document). As Paul pointed out,
> there are very good reasons and some great benefits to doing things
> this way, but it does use a lot of space.
>
>> b) Are we strictly talking about reclaiming of space held by older
>> revs that have been subsequently updated or is some sort of "bulking"
>> at play?
>
> Both. See my answer to (a).
>
>>
>> c) So, what about a cases in which there is next to no updating of
>> existing docs and do compactions make any difference in such cases ?
>
> Still gains to be had, just less significant.
>
>>
>> d) Is compaction similar to replication and if so then would a
>> continuous replication result in continuously compacted DB ?
>
> Similar in the way you state above. Different in two respects: (1)
> while compaction will transfer documents in batches, once the
> replication is "caught up" documents trickle in as they are written or
> updated on the source so the benefits of batching are lost; and (2)
> multiple writes to the same document will replicate so long as the
> target is keeping up with each write (CouchDB will collapse multiple
> edits during replication, but that, obviously, can't include edits yet
> to occur in the future).
>

Mime
View raw message