incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zdravko Gligic <zgli...@gmail.com>
Subject Re: Frugal Erlang vs Resources Hungry CouchDB
Date Thu, 30 Jun 2011 22:39:09 GMT
Robert Newson wrote:
>>
CouchDB *must* write an updated btree and an updated header to point
to the root of that btree every time you update a document, or it will
be lost if couch crashed right then.
<<

So, we have these 3 pieces of info that need to be written with every
update of a document:
1) the btree
2) the updated header that points to the root of the btree
3) the actual json document itself

If all 3 of these pieces are written to the same physical disk file
then I will respectfully bail out, as the rest of my question would
not make much sense, or at least not without major restructuring.
However, if (1) the btree is in a file of its own and if (2) the
updated header and (3) the acutal json document are written to the
same file then ..

a) How many of the update headers are actually useful?  Is it just the
last successfully written one or even just a few last ones ?

b) If only the last or last few headers are actually useful then could
those updated headers not be kept in a separate (perhaps pre
formatted) file, where the header records themselves were re-used
(perhaps in a ring or some other fashion) ?

c) If (a) and (b) make any sense then would one not result with a
perfectly compacted DB for at least all of the logging type of use
cases, where only new records are being created and existing ones are
never updated nor deleted?

d) While (c) might sound like a contrived "use case", I am asking
mostly to determine what (in addition to dead old revisions and
deleted docs) it is that is adding to the "bulkiness" of disk usage ?
In other words, are those "updated headers" one of the major
contributing factors (if not all of the factors) and could that be
remedied?

Thanks again and regards to everyone,
teslan

Mime
View raw message