couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Update notifications including update sequence
Date Mon, 19 Jan 2009 05:21:04 GMT
On Sun, Jan 18, 2009 at 11:56 PM, Antony Blakey <> wrote:
> On 19/01/2009, at 2:53 PM, Paul Davis wrote:
>> On Sun, Jan 18, 2009 at 10:51 PM, Antony Blakey <>
>> wrote:
>>> I've previously posted a solution using _external that doesn't hit couch
>>> every update, and that maintains MVCC consistency and lazy-update view
>>> behaviour.
>> Right. I tried looking through mark mail for a link to your
>> implementation but came up empty handed. I'd contemplated something
>> similar as well. The issue though is that Lucene index writers are
>> AFAIK not reentrant.
> Thread 'couchdb' started by Tim Parkin around 20/21 December.

Odd. I only noticed that last 2 or 3 posts of that thread before.
Thanks for the tip.

> IndexWriters are mutexed using a lock file.


>> Thus the headache of coordinating multiple random
>> processes would start to suck. Lots.
> My reading of the code was that there was a single process for each
> _external definition (although admittedly that was early in my understanding
> of gen_server). Major consistency issues result if requests to the _external
> aren't serialized.

There can be many _external processes for a single definition. So, not
only are requests not serialized, they can be concurrent etc.

>>> The problem with using notifications is lack of snapshot coordination
>>> between the update process and the external process.
>> I'd say this is use case dependent.
> It does mean that you can't guarantee that an external request (that does
> reference a given MVCC snapshot) is getting data from the same snapshot.
> You're right that's use case dependent, but the issue is whether the use
> case is 'free text indexing' or is a client use case. If the later, then you
> need to handle the situation where it *does* matter, so an implementation
> that has random characteristics is IMO less than optimal.

Err, right. Its use case dependent. If your (client defined) use case
requires certain characteristics, the update_notifcation/_external
process may just not be the right tool for the job etc etc.

>>> The synchronisation between sequential _external calls is obvious e.g.
>>> guaranteeing that the _external process sees a monotonic increasing
>>> update_seq.
>> I don't follow.
> I mean you'll never get a request in the context of an update_seq that your
> _external process has already advanced beyond, because the update_seqs seen
> by the external are a) serialized and b) only see a monotonic increasing
> sequence of update_seq values. Hence you can safely run an update process
> and set a 'last_update_seq_seen' (which is the key to avoiding hitting couch
> again) knowing that you never have to backtrack.

A single _external process should only see monotonically increasing
update_seq's. I think it's techincally possible to have a smaller
update_seq processed later in time in a different os process though
(later in time <= few ms).

The ideas from the other thread about having a UUID per db and
compaction are interesting, are either of those included the fs layout
stuff you were working on?


> Antony Blakey
> --------------------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
> Human beings, who are almost unique in having the ability to learn from the
> experience of others, are also remarkable for their apparent disinclination
> to do so.
>  -- Douglas Adams

View raw message