couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: View update speed improvements
Date Wed, 02 Mar 2011 14:44:59 GMT
On Mar 2, 2011, at 9:30 AM, Rob Pettefar wrote:

> On 02/03/2011 13:05, Bruno Rohée wrote:
>> On Wed, Mar 2, 2011 at 12:33 PM, Rob Pettefar
>> <rpettefar@gpslsolutions.com>  wrote:
>>>  Hi guys
>>> I've got a question about improving the speed at which views are updated in
>>> our system:
>>> 
>>> Currently we use a set of database documents to make up whole files after
>>> they have been requested out of the system. When submitted back into the
>>> database the old docs that held data are deleted and new docs are created in
>>> their place. This was done for simplicity of design. However when we have
>>> large file submitted into the system this will involve the deletion and
>>> creation of a large number of docs being deleted and created (we are looking
>>> at around 4,000 deletes and 4,000 new docs).
>>> The views then take some time to update after this has happened.
>>> 
>>> If we were to instead, modify the contents of the 4,000 documents (perhaps
>>> with some deletions and creations) would this reduce the amount of updates
>>> the system would have to put though the views and thus, reduce the time
>>> needed to update the views?
>> I think it's pretty dependent on your data, whether your new documents
>> are mostly identical or mostly different from the old ones. If it's
>> the former the process can be sped up quite a bit as the map function
>> will only be called on the changed documents, if it's the later not
>> much speed gain to be expected IMHO.
> This would probably involved writing over the content of the document, even with the
same data as before, inuring a new revision number. I guess that this would cause the map
functions to be run over it again.
> However I think the key thing here is a question of how mass deletions are treated by
the view updater.

Hi Rob, the view updater walks the database update feed and splits the entries into normal
documents and deleted ones.  The deleted documents are not sent to the view server OS process,
but otherwise they traverse a pretty similar path through the code.  In the end the updater
does batch modifications of the view indexes, removing the KVs corresponding to old versions
of documents and inserting the KVs from the map phase of the MR job.

The key point is that even when you modify documents the view updater still needs to delete
all the KVs associated with the old version of the document.  Deleting and then re-creating
documents might introduce a few extra lookups, but in my opinion you aren't likely to see
any major indexing speedup if you re-architect to do updates instead.  Happy to be proven
wrong though.  Best,

Adam


Mime
View raw message