incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: new purge functionality
Date Thu, 11 Sep 2008 21:23:28 GMT
Oh yeah, I forgot to mention, this breaks the file format. Sorry.  
You'll need to dump, upgrade and import your databases.

-Damien


On Sep 11, 2008, at 4:12 PM, Damien Katz wrote:

> I just checked in the new document purge functionality, which  
> removes all information about a document existence from a database.  
> New tests can be found in the test suite.
>
> Purge is not to be confused with deletion. A deletion is like an  
> edit to a document, and it's replicated the same as document edit.  
> However, purges are not like a new document edit, rather it's the  
> elimination of the document and meta-data from that instance of the  
> database, where as deletions still preserve the meta-data. After a  
> purge the same documents on other database replica instances will be  
> unaffected.
>
> The reason for purge is to both completely removing documents you no  
> longer care about (deletions from long ago) and it's necessary for  
> database partitioning, when the number of partitions is resized and  
> documents need to be moved between partitions. Purging document is  
> generally not something application code should worry about.
>
> Because we eliminate the record of the database, things that index  
> the database like views and full text search must take special steps  
> to ensure their indexes no longer include the purged document. One  
> way to accomplish this is to just completely rebuild the indexes  
> from scratch whenever something is purged. But that's very expensive  
> if you only purge a handful of documents, you must reexamine every  
> document in the database.
>
> To avoid this penalty CouchDB keeps track of only the documents most  
> recently purged. The next time it purges more documents, it will  
> forget about those previous purged documents. When the indexer  
> notices the purge seq has changed, if its only 1 seq number behind  
> the database's purge seq, then it has a chance to retrieve the list  
> of the most recently purged documents and remove them from the index  
> and update the indexes purge seq, then procede to update the indexes  
> normally. If the database purge seq is 2 or more than the last one  
> the index recorded, the index is automatically discarded and rebuilt  
> from scratch.
>
> This is already implemented by the view engine, but the full text  
> engine will still need modified to work with purge as well.
>
> When purging, you must specify the doc Id and the revision(s) to  
> purge. If there is already a later revision of a document, that  
> document isn't purged. Any document revision that doesn't exist is  
> ignored. Also an additional limitation is purge cannot happen during  
> a compaction, the client will get an error.
>
> The typical operations to efficiently and completely purge documents  
> would be:
> 1. Purge the document(s)
> 2. Cause the view indexes to be refreshed (for each design doc, open  
> a view with count=0, it will cause all the design doc;s view indexes  
> to be updated)
> 3. (Optionally) purge 0 more documents and cause the record of our  
> purged documents to be dropped.
> 4. Compact the database (Until this is done remnants of the purged  
> documents can still be found in the db file when dumped raw)


Mime
View raw message