couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henri van den Bulk <>
Subject Reconciling Data
Date Wed, 02 Nov 2011 16:20:07 GMT
I'm in the process of writing an external program in Java that reconciles data in CouchDB to
a source system. One of the basic parts is to determine what data needs to be removed from
CouchDB. The good thing is that the Ids in CouchDB are the same as the Ids in the source system.
However, some initial test seem that the process is very slow in determining what needs to
be removed.

Basically, here are the steps that I'm using:
Get all ids from the sources system
Get all ids from CouchDB using _all_docs and paging with the fast paging approach (e.g. start_key_docid
and limit)
Loop through the ids from Couch to see if they are not in the source id list
Using modify_docs to delete

The basic logic is using a NOT IN like in sql. However, I'm trying to determine if there is
a faster way of doing this directly in CouchDB. For example, how might we use the MapReduce
(View) capability to performing this delete. Or any other thoughts on syncing data in a fastest
way possible with CouchDB>

Oh NOTE: we can not delete the whole db 1st as we have mobile clients that used the _changes
and are bandwidth constraint.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message