couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Update notifications including update sequence
Date Mon, 19 Jan 2009 04:23:05 GMT
On Sun, Jan 18, 2009 at 10:51 PM, Antony Blakey <antony.blakey@gmail.com> wrote:
> I've previously posted a solution using _external that doesn't hit couch
> every update, and that maintains MVCC consistency and lazy-update view
> behaviour.
>

Right. I tried looking through mark mail for a link to your
implementation but came up empty handed. I'd contemplated something
similar as well. The issue though is that Lucene index writers are
AFAIK not reentrant. Thus the headache of coordinating multiple random
processes would start to suck. Lots.

> The problem with using notifications is lack of snapshot coordination
> between the update process and the external process.
>

I'd say this is use case dependent.

> The synchronisation between sequential _external calls is obvious e.g.
> guaranteeing that the _external process sees a monotonic increasing
> update_seq.
>

I don't follow.

You mention an sqlite db _external process similar to the GeoCouch
project a few times on the mailing list. How do you manage to keep
things sane in the face of possibly multiple-writers? I couldn't
figure anything out other than starting something with lock files
which is just plain dirty. And FTI indexing is obviously too expensive
to do multiple times so I can't just create an index per spawned os
process or some such.

Thanks,
Paul Davis

> On 19/01/2009, at 9:42 AM, Paul Davis wrote:
>
>> Hey,
>>
>> I'm working on this Lucene indexing stuff and I'm trying to write it
>> in such a way that I don't have to pound couchdb once per update. I
>> know that others have either gone every N updates or after a timeout,
>> but I'm not sure that's behavior that people would want in terms of
>> full text indexing.
>>
>> The general update_notification outline is:
>>
>> 1. Receive notification with type == "updated"
>> 2. while _all_docs_by_seq returns more data:
>>       index updates
>>
>> The kicker is that it's possible that while we're doing the while
>> loop, we're receiving more update notifications. Naively we could just
>> queue them up and process them all which leads to us hitting couchdb
>> at least once per write to the db (which is teh suck) or we could
>> discard them all except for one and just restart the indexer when it
>> thinks it's finished etc etc.
>>
>> After thinking about this, I thought that a simple way to actually
>> know if you need to start indexing again is if the notification sent
>> to update_notifications included the update_seq of the db. Then your
>> indexer that is already storing the current update_seq can just
>> compare if there's something new that needs to be worked on without
>> having to make an http request.
>>
>> Then it just becomes "index till no new docs, then discard all update
>> notifications with an update_seq we've already indexed past.
>>
>> I attached a patch that is extremely trivial, but I'd like to hear if
>> anyone has feed back on the merits or if there's just a better way
>> that I'm not thinking of.
>>
>> Thanks,
>> Paul Davis
>> <update_notification_sequene.patch>
>
> Antony Blakey
> --------------------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> You can't just ask customers what they want and then try to give that to
> them. By the time you get it built, they'll want something new.
>  -- Steve Jobs
>
>
>
>

Mime
View raw message