couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Update notifications including update sequence
Date Sun, 18 Jan 2009 23:12:03 GMT
Hey,

I'm working on this Lucene indexing stuff and I'm trying to write it
in such a way that I don't have to pound couchdb once per update. I
know that others have either gone every N updates or after a timeout,
but I'm not sure that's behavior that people would want in terms of
full text indexing.

The general update_notification outline is:

1. Receive notification with type == "updated"
2. while _all_docs_by_seq returns more data:
        index updates

The kicker is that it's possible that while we're doing the while
loop, we're receiving more update notifications. Naively we could just
queue them up and process them all which leads to us hitting couchdb
at least once per write to the db (which is teh suck) or we could
discard them all except for one and just restart the indexer when it
thinks it's finished etc etc.

After thinking about this, I thought that a simple way to actually
know if you need to start indexing again is if the notification sent
to update_notifications included the update_seq of the db. Then your
indexer that is already storing the current update_seq can just
compare if there's something new that needs to be worked on without
having to make an http request.

Then it just becomes "index till no new docs, then discard all update
notifications with an update_seq we've already indexed past.

I attached a patch that is extremely trivial, but I'd like to hear if
anyone has feed back on the merits or if there's just a better way
that I'm not thinking of.

Thanks,
Paul Davis

Mime
View raw message