couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Goodall" <matt.good...@gmail.com>
Subject Re: Sphinx license
Date Sat, 29 Mar 2008 11:04:06 GMT
On 29/03/2008, Johan Sørensen <johan@johansorensen.com> wrote:

>  I am however curious why I don't get the document id sent to the
>  notifier daemon/client on a database update? Because it's not
>  available on the update_loop function?

I expected to be sent the document id at first but then I realised
that sending the database name and leaving the indexing process to
track indexing progress means the whole thing is less coupled and
copes with indexer failure better.

Less coupled, because the indexer is free to choose how much of the
database it indexes each time (although hopefully it will only process
the changes). Possibly a contrived example, but perhaps the indexer
only processes changes once per hour.

Copes better with indexer failure, because if the indexer crashes then
it can be fixed in place & restarted and the indexer can pick up from
it's last know good state. If the indexer is written to be idempotent
then it doesn't even matter if it processes some changes twice.

Anyway, that's how I've been doing it ;-).

My basic process is:

1. CouchDB tells my indexer a database has been changed.
2. I load the last know state (the 'key' recorded in step 6).
3. Call /<dbname>/_all_docs_by_seq with a startkey of the key (if any)
and a count of 100 (for instance).
4. Process the batch of changes returned from the above request.
5. Repeat steps 3 & 4 until there's nothing left to process.
6. Record the key of the last change processed.
7. Go back to waiting for CouchDB to send another database name.

Of course, it's quite possible to parallelise a couple of parts of the
above process.

- Matt
Mime
View raw message