couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CGS <cgsmcml...@gmail.com>
Subject Re: Database size seems off even after compaction runs.
Date Fri, 23 Dec 2011 12:48:42 GMT
Hi,

Sorry to interfere with such a question, but why don't you work with a 
buffer database? I mean, make a replica to another database which 
filters out the deleted documents. In such way you can clean all your 
databases and you use temporary some extra-space (only during the 
"cleaning" process). Another idea would be to use two databases: one 
active and one inactive at the given time. That means, you move the data 
from one to the other, filtering out the deleted documents, and when 
it's over, you switch to the newly constructed database, while the other 
gets emptied (deleted and re-created). Just my 2c opinions.

CGS





On 12/23/2011 01:20 PM, Henrik Lundgren wrote:
> Ok, so how do I prevent the database from consuming all diskspace in
> the long run?
>
> I'm developing an application that is quite insert heavy ( about 6 Gb
> / day ), the database is essentially a message inbox.
>
> I plan to delete obsolete messages in a houskeeping job, but if
> CouchDB will retain the latest revision of all documents I might have
> to reconsider using CouchDB, which is a pity :-(
>
> Henrik
>
> On Fri, Dec 23, 2011 at 12:36 PM, Marcello Nuccio
> <marcello.nuccio@gmail.com>  wrote:
>> OK, I've added the replies from Robert and Paul to
>> http://wiki.apache.org/couchdb/FUQ
>>
>> Then it is right to say that there are informations that can't be
>> deleted from a database, for example the _id of documents?
>>
>> Thanks for the clarifications, since this behaviour was totally non
>> obvious to me.
>>
>> Marcello
>>
>> 2011/12/23 Robert Newson<rnewson@apache.org>:
>>> An update to the wiki would be be very helpful.
>>>
>>> It's worth saying again that compaction does *not* remove "deleted
>>> documents’ contents". We keep the latest revision of every document
>>> ever seen, even if that revision has _deleted:true in it. This is so
>>> that replication can ensure eventual consistency between replicas. Not
>>> only will all replicas agree on which documents are present and which
>>> are not, but also the contents of both.
>>>
>>> B.
>>>
>>> On 23 December 2011 08:11, Marcello Nuccio<marcello.nuccio@gmail.com> 
wrote:
>>>> 2011/12/23 Paul Davis<paul.joseph.davis@gmail.com>:
>>>>> On Thu, Dec 22, 2011 at 7:00 PM, Jens Alfke<jens@couchbase.com>
 wrote:
>>>>>> On Dec 22, 2011, at 1:44 PM, Chris Stockton wrote:
>>>>>>
>>>>>> Okay, so this catches me a bit off guard, always thought compaction
>>>>>> cleaned those up.
>>>>>>
>>>>>> Compaction removes old revisions’ and deleted documents’ contents,
but their revision histories are still there. Those should be pretty small, though, since
they’re just trees of revision IDs.
>>>>>>
>>>>>> (Unless you did delete the docs by just setting a “_deleted”
attribute? I don’t know what the behavior of that would be; sounds like it doesn’t actually
delete the document from the database, in which case maybe the last revision data does get
left behind.)
>>>>>>
>>>>>> —Jens
>>>>> Deleted documents specifically allow for a body to be set in the
>>>>> deleted revision. The intention for this is to have a "who deleted
>>>>> this" type of meta data for the doc. Some client libraries delete docs
>>>>> by grabbing the current object blob, adding a '"_deleted": true'
>>>>> member, and then sending it back which inadvertently (in most cases)
>>>>> keeps the last doc body around after compaction.
>>>> Can I write these informations in the wiki?
>>>> I think it would be very useful in
>>>> http://wiki.apache.org/couchdb/Compaction
>>>> and in http://wiki.apache.org/couchdb/FUQ
>>>>
>>>> Marcello


Mime
View raw message